Cell Phone based Activity Detection using Markov Logic Network



Similar documents
Classification Problems

The Basics of Graphical Models

Christfried Webers. Canberra February June 2015

Course: Model, Learning, and Inference: Lecture 5

Linear Classification. Volker Tresp Summer 2015

Learning is a very general term denoting the way in which agents:

Basics of Statistical Machine Learning

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

Conditional Random Fields: An Introduction

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

A Content based Spam Filtering Using Optical Back Propagation Technique

Data Mining Algorithms Part 1. Dejan Sarka

Tensor Factorization for Multi-Relational Learning

Statistics Graduate Courses

An Introduction to Machine Learning

Join Bayes Nets: A New Type of Bayes net for Relational Data

Life of A Knowledge Base (KB)

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

: Introduction to Machine Learning Dr. Rita Osadchy

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Machine Learning.

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Chapter 28. Bayesian Networks

Activity recognition in ADL settings. Ben Kröse

Finding soon-to-fail disks in a haystack

Multi-Class and Structured Classification

Statistical Machine Learning

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Social Media Mining. Data Mining Essentials

STA 4273H: Statistical Machine Learning

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Lecture 9: Introduction to Pattern Analysis

Bayesian Statistics: Indian Buffet Process

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Lecture 8 February 4

Predict Influencers in the Social Network

Simple Linear Regression Inference

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Linear Threshold Units

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Bayesian networks - Time-series models - Apache Spark & Scala

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

High Productivity Data Processing Analytics Methods with Applications

Hardware Implementation of Probabilistic State Machine for Word Recognition

Machine Learning and Pattern Recognition Logistic Regression

The primary goal of this thesis was to understand how the spatial dependence of

Lecture 3: Linear methods for classification

Question 2 Naïve Bayes (16 points)

Supervised Learning (Big Data Analytics)

Alabama Department of Postsecondary Education

Programming Exercise 3: Multi-class Classification and Neural Networks

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Machine learning for algo trading

Model Combination. 24 Novembre 2009

Chapter 6. The stacking ensemble approach

A Game Theoretical Framework for Adversarial Learning

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Course Syllabus For Operations Management. Management Information Systems

Detecting Corporate Fraud: An Application of Machine Learning

Human Activities Recognition in Android Smartphone Using Support Vector Machine

Lifted MAP Inference for Markov Logic Networks

Azure Machine Learning, SQL Data Mining and R

Data Mining - Evaluation of Classifiers

Data, Measurements, Features

Latent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou

Module 1: Sensor Data Acquisition and Processing in Android

Predictive Data modeling for health care: Comparative performance study of different prediction models

Advanced Signal Processing and Digital Noise Reduction

Maschinelles Lernen mit MATLAB

Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation

The Data Mining Process

A survey on click modeling in web search

Maritime Threat Detection Using Probabilistic Graphical Models

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

BayesX - Software for Bayesian Inference in Structured Additive Regression

Data Mining Practical Machine Learning Tools and Techniques

1 Maximum likelihood estimation

Statistical Models in Data Mining

Learning from Data: Naive Bayes

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Machine Learning Overview

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Support Vector Machines Explained

Decompose Error Rate into components, some of which can be measured on unlabeled data

Message-passing sequential detection of multiple change points in networks

Big Data Science. Prof. Lise Getoor University of Maryland, College Park. October 17, 2013

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Comparison of Data Mining Techniques used for Financial Data Analysis

Machine Learning Logistic Regression

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

How can we discover stocks that will

Monotonicity Hints. Abstract

Inactivity Recognition: Separating Moving Phones from Stationary Users

Enhancing survey methods and instruments to better capture personal travel behaviour

Transcription:

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart phones now incorporates many diverse and powerful sensors, like GPS sensors, light sensors, temperature sensors, direction sensors (i.e., magnetic compasses), and acceleration sensors (i.e., accelerometers). In this project we are trying to build a system that uses phone-based accelerometers, gyroscope and magnetometers to perform activity recognition, a task which involves identifying the physical activity a user is performing. In order to address the activity recognition task as a sequential supervised learning problem, we are going to represnt the collected sensor data a time series and we will use Markov Logic Network to model this time series data. 2 Motivation Dealing with sequential data has become an important application area of machine learning. Such data are frequently found in speech recognition, activity recognition, information extraction, etc. One of the main problems in this area of machine learning is assigning labels to sequences of objects. This class of problems has been called sequential supervised learning. Probabilistic graphical models such as Hidden Markov Models (HMM) or their generalization Dynamic Bayesian Networks (DBN [Murphy, 2002]) have been quite successful in modeling sequential phenomena. However, the main weaknesses for these models are: HMM (as well as DBN) is a generative model. The goal of generative graphical models is to model the joint probability distributions p(x, Y ) where X is the observed features, and Y is the label. If there are many observed features, all combinations of the observed features must be enumerated in order to compute the joint distribution. This is generally intractable. It is hard to change the model of a DBN or HMM, and also dependencies among input data is hard to specify. For this reason we propose to use Markov Logic Networks (MLN) for modelling. MLNs can be seen as a template for generating undirectional graphical model (or Markov Network). They are easy to specify as the structure of the graphical model is specified using First Order logic. MLN also gives us fliexbility to use as a discriminitive model or as a generative model. 1

3 Background on Markov Logic Network Markov Logic Networks (MLN, [Richardson and Domingos, 2006]) are one type of the unrolled graphical models developed in Statistical Relational Learning(SRL, [Getoor and Taskar, 2007] ) to combine logical and probabilistic reasoning. An MLN L is a set of pairs(f i, w i ), where F i is a formula in the first order logic and w i is a real number (called the weight of the formula F i ). Every instantiation of F i is given the same weight. Together with a finite set of constants C = c 1, c 2,..., c C, it defines a Markov network M L,C as follows: M L,C contains one binary node for each possible grounding of each predicate appearing in L. The value of the node is 1 if the ground predicate is true, and 0 otherwise. M L,C contains one feature for each possible grounding of each formula F i in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the w i associated with F i in L. Thus first-order logic formulae in our knowledge base serve as templates to construct the Markov Network. This network models the joint distribution of the set of all ground atoms, X, each of which is a binary variable. It provides a means for performing probabilistic inference. The probability distribution over possible worlds x specified by the ground Markov network M L,C is given by P (X = x) = 1 Z exp( i w i n i (x)) = 1 φ i (x {i} ) ni(x) Z where n i (x) is the number of true groundings of F i in x, x {i} is the state (truth values) of the predicates appearing in F i, φ i (x {i} ) = e wi and Z is the normalizing factor, Z = x X exp( i w in i (x)). However classic Markov logic delas with discrete features, where as for the current project we need to use continuous features, so we are using Hybrid Markov Logic Network, which is an extension of MLNs to numeric domain. The Hybrid Markov Logic Network is defined as follows A hybrid Markov logic network(hmln) L is a set of pairs(f i, w i ), where F i is a formula or a numeric term, and w i is a real number. Together with a finite set of constants C = c 1, c 2,..., c C, it defines a Markov network M L,C as follows: M L,C contains one node for each possible grounding with constants in C of each predicate or numeric property appearing in L. The value of a predicate node is 1 if the ground predicate is true, and 0 otherwise. The value of a numeric node is the value of the corresponding ground term. M L,C contains one feature for each possible grounding with constants in C of each formula or numeric term F i in L. The value of a numeric feature is the value of the corresponding ground term. The weight of the feature is the w i associated with F i in L. HMLNs also allow a few extensions of first-order syntax, among these the most important one is soft equality. This is written as (α = β) for numeric terms. This notation is a shorthand for (α β) 2, where α and β are arbitrary numeric terms. This makes it possible to state numeric constraints as equations, with an implied Gaussian penalty for diverging from them. If the weight of a formula is w, the standard deviation of the Gaussian is σ = 1/ 2w. A numeric domain can now be modeled simply by writing down the equations that describe it. i 2

4 Challenges A couple of challenges were faced when implementing the collected cell phone sensor data as a sequential graphical model. They are as follows The gathered data was not sequential. The data has been collected from classification point of view. Hence there is no transition from one activity to another in the data. This effectively rules out the possibility of modelling the data as a sequential graphical model (such as HMM) Creating a graphical model with continuous variables is a open research area. In most cases a distribution of the continuous features are assumed, which like any assumptions may or may not be true. Alchemy is in work-in-progress, (at least for contiuous features). There are a few problems realted with alchemy (in terms of system specification). For example, only the soft equality is implemented in Alchemy, where as soft inequality is not implmented (though it is mentioned in the documentation). Inference in graphical model (even approximate inference) is intractable. As a result the inference and learning of the graphical model takes a long time. 5 Implementation 5.1 Feature Extraction For classifying activity from the time series data, first the time component was removed. For that reason a sliding window technique was used. The window size is taken as 2 second and between two consequtive windows there is an overlap of 1 second. After that a few basic features are extracted from it. These features are mostly statistical measures of central tendency. The extracted features are as follows Mean The average of the values in the window. (In statistical terms E[X]) Variance The variance of the values in the window. (In statistical terms E[X] 2 (E[X]) 2 ) k th Moment Third, fourth fifth and sixth moments are used as features. (The momnet of a random variable is defined as E[X k ]) k th Central Moment Third, fourth fifth and sixth central moments are used as features. (The central momnet of a random variable is defined as E[(X µ X ) k ], where µ X = E[X] ) Amplitude The differnce between highest and lowest value in the window. These features are extracted from accelerometer values on three axis. The gyroscope values are not used for feature extraction as they had very weak correlation with activity. Among these features the most important feature is variance (as it has got highest weight, and as it alone can classify 60% of the instances). 3

5.2 Validation For classification problem, the method of validation is of utmost important. Among differnt validation methods available I have chosen Leave One Person Out (LOPO). The data is collected for ten different person, so at the time of training the classifier, I am using data for nine person, and I am testing the data on tenth person. Instead of doing this I could have chosen to perform a 10 fold cross validation, but LOPO method is chosen as it gives better generalization. 5.3 Technique Using Alchemy the system has been modeled as a Gaussian Naïve Bayes, with Activity as the class variable and accelerometer variance, and amplitude as the observed variable. As all the features are continuous features, Hybrid Markov Logic Network is used for modelling the system. The model is hand trained with the mean and standard deviation computed from the data. As we have assumed Normal distribution of the features and as all the features are assumed to be conditionally independent, we computed the weights of each formula as w = 1/2σ 2. Also as we have different number of examples for different activity (and most of the examples were walking), the prior of the Activity distribution is assumed to be same (else there is a strong bias to classify every instance as a walking instance). Also for comparison purpose the system is also modelled using Logistic Regression and linear SVM. 6 Results The MLN was trained and tested on only left and right pocket data set. The accuracy we received for the MLN is 65.82%. The confusion matrix is as follows Predicted Class ClimbUp Running ClimbDown Jogging Walking Still ClimbUp 57 1 9 3 4 0 Running 9 60 11 2 8 0 ClimbDown 5 0 15 15 9 0 Jogging 6 1 54 9 26 0 Walking 4 0 25 12 51 2 Still 1 0 0 0 0 75 Table 1: Confusion Matrix, using MLN 4

For Logistic Regression, the classifier was trained on all three position as well as tested on these three position using LOPO validation. The accuracy we received using Logistic Regression is 74.35%. The confusion matrix is as follows a b c d e f classified as 675 43 0 0 2 11 a = Still 32 5840 24 158 126 217 b = Walking 1 105 764 225 23 44 c = Running 10 548 133 1384 76 59 d = Jogging 17 452 15 167 414 103 e = ClimbDown 4 551 71 73 68 656 f = ClimbUp Table 2: Confusion Matrix, using Logistic Regression For linear SVM, the classifier was trained and tested on only left and right pocket data set, and validation method used was 10 fold cross validation. The accuracy we received using SVM is 77.84%. The confusion matrix is as follows a b c d e f classified as 709 5 0 0 1 0 a = Still 10 4322 21 104 69 101 b = Walking 0 132 497 102 13 31 c = Running 6 655 69 655 47 41 d = Jogging 11 300 2 76 486 32 e = ClimbDown 1 252 6 13 26 799 f = ClimbUp Table 3: Confusion Matrix, using linear SVM References [Richardson and Domingos, 2006] Richardson, M., Domingos, P Networks. Machine Learning, 2006. Markov Logic [Getoor and Taskar, 2007] Getoor, L., Taskar, B, Intro. to Statistical Relational Learning.. MIT Press, 2007. [Murphy, 2002] Murphy, K.P., Dynamic bayesian networks: representation, inference and learning. PhD thesis, University of California, 2002. 5