Human Activity Recognition using Smartphone

2011 Fall CSCE666 Projet Report Human Ativity Reognition using Smartphone Amin Rasekh Chien-An Chen Yan Lu Texas A&M University ABSTRACT Human ativity reognition has wide appliations in medial researh and human survey system. In this projet, we design a robust ativity reognition system based on a smartphone. The system uses a 3-dimentional smartphone aelerometer as the only sensor to ollet time series signals, from whih 31 features are generated in both time and frequeny domain. Ativities are lassified using 4 different passive learning methods, i.e., quadrati lassifier, k-nearest neighbor algorithm, support vetor mahine, and artifiial neural networks. Dimensionality redution is performed through both feature extration and subset seletion. Besides passive learning, we also apply ative learning algorithms to redue data labeling expense. Experiment results show that the lassifiation rate of passive learning reahes 84.4% and it is robust to ommon positions and poses of ellphone. The results of ative learning on real data demonstrate a redution of labeling labor to ahieve omparable performane with passive learning. 1. INTRODUCTION The demands for understanding human ativities have grown in health-are domain, espeially in elder are support, rehabilitation assistane, diabetes, and ognitive disorders. [1,2,3]. A huge amount of resoures an be saved if sensors an help aretakers reord and monitor the patients all the time and report automatially when any abnormal behavior is deteted. Other appliations suh as human survey system and loation indiator are all benefited from the study. Many studies have suessfully identified ativities using wearable sensors with very low error rate, but the majority of the previous works are done in the laboratories with very onstrained settings. Readings from multiple body-attahed sensors ahieve low error-rate, but the ompliated setting is not feasible in pratie. This projet uses low-ost and ommerially available smartphones as sensors to identify human ativities. The growing popularity and omputational power of smartphone make it an ideal andidate for non-intrusive body-attahed sensors. Aording to the statisti of US mobile subsribers, around 44% of mobile subsribers in 2011 own smartphones and 96% of these smartphones have built-in inertial sensors suh as aelerometer or gyrosope [4,5]. Researh has shown that gyrosope an help ativity reognition even though its ontribution alone is not as good as aelerometer [6,7]. Beause gyrosope is not so easily aessed in ellphones as aelerometer, our system only uses readings from a 3-dimensional aelerometer. Unlike many other works before, we relaxed the onstraints of attahing sensors to fixed body position with fixed devie orientation. In our design, the phone an be plaed at any position around waist suh as jaket poket and pants poket, with arbitrary orientation. These are the most ommon positions where people arry mobile phones. Training proess is always required when a new ativity is added to the system. Parameters of the same algorithm may need to be trained and adjusted when the algorithm runs on different devies due to the variane of sensors. However, labeling a time-series data is a time onsuming proess and it is not always possible to request users to label all the training data. As a result, we propose using ative learning tehnique to aelerate the training proess. Given a lassifier, ative learning intelligently queries the unlabeled samples and learns the parameters from the orret labels answered by the orale, usually human. In this fashion, users label only the samples that the algorithm asks for and the total amount of required training samples is redued. To the best of our knowledge, there is no previous study on applying ative learning to human ativity reognition problem. The goal of this projet is to design a light weight and aurate system on smartphone that an reognize human ativities. Moreover, to redue the labeling time and burden, ative learning models are developed. Through testing and omparing different learning algorithms, we find one that best fit our system in terms of effiieny and auray on a smartphone. 2. LITERATURE REVIEW Human ativity reognition has been studied for years and researhers have proposed different solutions to attak the problem. Existing approahes typially use vision sensor, inertial sensor and the mixture of both. Mahine learning and threshold-base algorithms are often applied. Mahine learning usually produes more aurate and reliable results, while threshold-based algorithms are faster and simpler. One or multiple ameras have been used to apture and identify body posture [8, 9]. Multiple aelerometers and gyrosopes attahed to different body positions are the most ommon solutions [10-13]. Approahes that ombine both vision and

inertial sensors have also been purposed [14]. Another essential part of all these algorithms is data proessing. The quality of the input features has a great impat on the performane. Some previous works are foused on generating the most useful features from the time series data set [15]. The ommon approah is to analyze the signal in both time and frequeny domain. Ative learning tehnique has been applied on many mahine learning problems that are time-onsuming and labor-expensive to label samples. Some appliations inlude speeh reognition, information extration, and handwritten harater reognition [18,19,20]. This tehnique, however, has yet been applied on the human ativity problem before. 3. METHODS 3.1 Feature Generation To ollet the aeleration data, eah subjet arries a smartphone for a few hours and performs some ativities. In this projet, five kinds of ommon ativities are studied, inluding walking, limping, jogging, walking upstairs, and walking downstairs. The position of the phone an be anywhere lose to the waist and the orientation is arbitrary. The built-in aelerometer we use has maximum sampling frequeny 50 Hz and ±3g sensitivity. Aording to a previous study, body movements are onstrained within frequeny omponents below 20Hz, and 99% of the energy is ontained below 15 Hz [16]. Aording to Nyquist frequeny theory, 50 Hz aelerometer is suffiient for our study. A low-pass filter with 25Hz utoff frequeny is applied to suppress the noise. Also, due to the instability of phone sensor, whih may drop samples aidentally, interpolation is applied to fill the gaps. Table 1. Features Generation Time Domain Variane Mean Median 25% Perentile 75% Perentile Correlation between eah axis Average Resultant Aeleration (1 resultant feature) Frequeny Domain Energy Entropy Centroid Frequeny Peak Frequeny To analyze the ativities in a short period, we group every 256 sample in a window, whih orresponds to 5.12 se length of data. The hoie of 256, whih is a power of two, is a preferred size when applying Fast Fourier Transformation. For eah sample window, 31 features are extrated in both time domain and frequeny domain as shown in Table 1. Exept for the average resultant aeleration, all the other features are generated for x, y and z diretions. 3.2 Classifiers In this projet, four kinds of lassifiers are employed to lassify the ativity as desribed below. 3.2.1 Quadrati Classifier If we assume every lass is normally distributed, then the disriminant funtion for lass is defined as ( ) ( ) ( ) ( ), where and represent the mean and ovariane of the Gaussian distribution of lass, respetively. Given a feature vetor x, a quadrati lassifier assigns ( ) ( ). Therefore, the deision boundary is in general a quadrati urve. 3.2.2 k-nearest Neighbor The k-nearest neighbor (knn) algorithm lassifies unlabeled instanes based on a voting of the labels of k losest training examples in the feature spae. knn is a lazy learning algorithm sine it defers data proessing until a lassifiation request arises. Beause knn uses loal information, it an ahieve highly adaptive performane. On the other hand, knn involves large storage requirement and intensive omputation, and the value of k also needs to be determined properly. 3.2.3 Support Vetor Mahine As a supervised lassifier, a standard support vetor mahine (SVM) aims to find a hyperplane separating 2 lasses whih maximizes the distane to the losest points from eah lass. The losest points are alled support vetors. Given n training data points {x i }, and lass labels {y i },, a hyperplane separating two lasses has the form ( ) Suppose {w k } is the set of all suh hyperplanes. The optimal hyperplane is defined by and b is set by the Karush Kuhn Tuker onditions where { } maximize subjet to, In the linearly separable ase, only the support vetors will be non-zero. s orresponding Sine the data points x only enter alulation via dot produt, we an use a mapping ( ) to transform them to

another feature spae suh that the originally non-linearly separable data an be linearly separable after mapping. Moreover, ( ) is not neessarily an expliit funtion. Instead, we are only interested in a kernel funtion ( ) ( ) ( ) whih satisfies Merer s ondition. In this study, we use a radial basis funtion kernel ( ) To extend a standard SVM for multilass problem, we use the one-against-all strategy, whih trains a standard SVM for eah lass and assigns an unknown pattern to the lass with the highest sore. 3.2.4 Artifiial Neural Network An artifiial neural network (ANN) is a omputational model onsisting of interonneted artifiial neurons (or nodes) that is inspired from biologial neural networks. ANNs are able to model omplex relationships between inputs and outputs or to find patterns in data. In this projet, we use a lass of ANN alled multilayer pereptron (MLP) as a lassifier as illustrated in Fig. 1. Bakpropagation algorithm is used in the training proess.. interlass distane or statistial independene. Wrappers evaluate features by the predition auray of a lassifier. In this study, we use wrappers for feature seletion. 3.4 Ative Learning Ative learning is one of the mainstream mahine learning methods for solving a lass of problems where a large amount of unlabeled data may be available or easily obtained, but labels are diffiult, expensive, and timeonsuming to ahieve. The ore idea of ative learning is that a mahine learning tehnique an ahieve higher auray using fewer training labels if it selets the data from whih it learns [17] The learning proess involves interation with an orale who labels unlabeled data samples through guided queries made by the learner as illustrated in Fig. 2. In order to get higher lassifiation rate through less labeled training set, the learner searhes to label the unlabeled instanes that are most informative. Figure 1. Artifiial neural network. (from [21]) 3.3 Dimensionality Redution There are two ways to do feature dimension redution: feature extration and feature seletion. 3.3.1 Feature Extration Feature extration transforms the original high dimensional data to a lower dimension feature spae. The transformation an be linear or nonlinear. In this projet, we employed Linear Disriminant Analysis (LDA). 3.3.2 Feature Seletion Feature seletion is a tehnique of seleting a subset of most relevant features from the original features. While feature seletion may be regarded as a speial ase of feature extration mathematially, the researhes in these two areas are quite different. In feature seletion, an objetive funtion is needed to evaluate andidate features. Two kinds of objetive funtions are available: filters and wrappers. Filters evaluate the feature subsets based on their information ontent like Figure 2. The ative learning yle (from [17]) The problem of seleting unlabeled instanes is thus the prinipal hallenge for the ative learning proess. Typially, the query builds upon notions of unertainty in lassifiation. For example, samples that are most likely to be mislassified an be onsidered to be the most informative and will be hosen for query. In this study, the unertainty u (x) for every unlabeled instane x is quantified in a distint way depending upon what learning algorithm is used, as follows: Quadrati Classifier Query is performed first for the unlabeled instanes that are nearest to the disriminant line and are aordingly most unertain. For a two-lass problem, unertainty for an unlabeled instane x is measured as: u ( x) g 1 1 ( x) g2( x) where g i (x) is the quadrati disriminant funtion for lass i. For a multi-lass lassifiation problem, u (x) is first alulated for all binary ombinations of existing lasses and the maximum value is onsidered as the measure of unertainty for instane x.

Classifiation rate k-nearest Neighbors Appliation of distane measure is not feasible for the knn tehnique. The unertainty is measured using the onept of Shannon entropy H. In mathematial terms, u ( x ) H ( x ) p log p where denotes lasses and p is the probability that instane x belongs to a speifi lass. p is alulated through dividing the number of neighbors that belong to a speifi lass over the total number of neighbors k. Support Vetor Mahines Sine we adopt the one-against-all method for multilass problem, we hoose the sample that has the smallest distane to the deision boundary as the next query. Artifiial Neural Networks Given a sample, the output of ANN onsists of the probability of the sample belonging to eah lass. Similar to the unertainty measure used in knn, query is made here for the sample that has the highest entropy. 4. RESULTS AND DISCUSSIONS 4.1 Data Colletion Our experiment data is olleted by three persons using a HTC Evo Smartphone. A total amount of 1393 samples are obtained. 75% of the data is used for training and the rest is used for testing. In order to illustrate the omplexity of the lassifiation problem, the first two LDA omponents are plotted in Fig. 3(a), and the two best seleted features are illustrated in Fig. 3(b). As observed, the lassifiation problem is nontrivial. Two ativities of walking upstairs and downstairs, in speifi, are very diffiult to be disriminated. 4.2 Passive Learning Four lassifiers, quadrati, knn, ANN, and SVM, are studied. SVM-KM [21] and Matlab ANN toolboxes are used in this study. All methods are tested with the samples in original feature spae, LDA subspae, and sequential forward seletion (SFS) subspae. We run SFS algorithm on eah lassifier and pik the best five features that are seleted by all lassifiers. The same feature subset is then used on all lassifiers. The seleted features are variane, 75 perentile, frequeny entropy, and peak frequeny. It is also observed that z-axis is the most informative diretion beause ellphones are usually attahed to human body vertially and the z-axis, whih is perpendiular to the sreen, is independent to the orientation of the phone. LDA omponent 2 Z axis 25 Perentile - -1-1.5-2 -2.5-3 -2-1.5-1 - 0 LDA omponent 1 6 4 2 0-2 -4 (a) walking limping jogging downstair upstair walking limping jogging downstairs upstairs -6 0 5 10 15 20 25 30 35 Z axis Variane (b) Figure 3. The distribution of lasses in (a) LDA spae and (b) and subspae of the two best features Fig. 4 shows the performane of eah lassifier in different feature spaes. The maximum lassifiation rate is ahieved when SVM is used with the SFS (84.4%). The quadrati algorithm, on the other hand, has the worst performane. For all lassifiers exept SVM, the performane is highest in the LDA subspae and lowest in the original feature spae. Quadrati lassifier gives the lowest lassifiation rate due to the non-gaussian distribution of eah lass. Feature subset seletion enhanes the performane of the SVM method beause it removes the features that misguide the algorithm. knn lassifiation rate is notieably improved after LDA is performed. knn is highly sensitive to the sales of different features. LDA alleviates this problem through reduing the feature spae into a more normalized and smaller subspae. 0.9 0.3 0.2 0.1 0 Original spae SFS LDA Quadrati KNN ANN SVM Figure 4. Passive learning performane in original spae, subset spae (SFS), and LDA subspae

4.3 Ative Learning For our experiments, we onsider a randomly initialized training set, a test set, and a query set that ontains unlabeled samples. Passive learning results showed that dimensionality redution (SFS and LDA) may signifiantly enhane the lassifiation performane. Aordingly, the ative learning is performed here using the subset of features and LDA subspae. The test set omprises 25% of the original dataset. The ative learning model queries the unlabeled samples from the query set only, whereas the lassifiation rate is reported on the test set. For every lassifier, the average lassifiation rate is reported (averaged over 50 runs). The initial training set is seeded randomly by seleting 4 training samples per lass and the remaining samples form the query set. For eah round of ative learning, one unlabeled sample is seleted from the query set and added to the training set. If the samples hosen for the query are seleted solely based upon the unertainty measure, there is a possibility that ertain regions in the feature spae are never explored. This problem is most serious for quadrati lassifier as the training set onverges to a long and thin spae along the disriminant urve as it grows with more queries. Under these irumstanes, the distributions of lasses extremely deviate from their true shape. To deal with this problem, some samples may be piked from the query set randomly than using the unertainty measure. For our problem, the probability that the query is made randomly is set to 0.10. Performane of ative learning algorithms is ommonly assessed by onstruting learning urves. It is a plot that shows the performane measure of interest (e.g. lassifiation rate) as a funtion of the number of queries performed. Fig. 5 presents learning urves for the first 300 samples labeled using the unertainty query (with 10% random sampling) and pure random sampling for all four lassifiers. Classifiation rate 0.9 5 5 5 5 5 KNN Ative sampling (LDA) Number of training instanes added Classifiation rate Classifiation rate Classifiation rate 0.9 5 5 5 5 0.3 0.2 0.3 0.2 SVM Ative sampling (LDA) Number of training instanes added Quadrati Ative sampling (LDA) Number of training instanes added ANN Ative sampling (LDA) Number of training instanes added Figure 5. Learning urves for different lassifiers in LDA and SFS subspae For knn lassifier, the ative learning urve learly dominates the baseline random sampling urve for all the points. While the learning urves for both ative and random sampling for LDA and SFS start from the same lassifiation rate (4), the maximum learning rate is ahieved when

ative sampling is performed in LDA spae. These findings are onsistent with the results of passive learning where learning proess in LDA spae outperformed original highdimensional spae and seleted subset. SVM ative learning algorithm also performs better than random sampling in both LDA and seleted subset spae. While the results of passive learning showed that SVM performs better in seleted subset spae, it is observed here this is not true when only 20 instanes are used. In the other words, if very small dataset is available, it is more effiient to use the LDA spae than the seleted subset spae. The learning urves for the seleted subspae, nevertheless, dominate those for LDA after about 70 queries are made. While the results showed ative learning obviously outperforms random sampling for KNN and SVM tehniques, no preise onlusion might be made for ANN. While the performane inreases as more queries are made in general, the learning urves signifiantly osillate. This instability may be due to the high sensitivity of weight funtions to the new samples added to the training set when this set is small. As observed more learly for the LDA spae, the osillations damp with inreasing size of the training set. Opposite to KNN, SVM, and ANN, the quadrati ative learning algorithm is totally dominated by the random sampling exept for the first 10 queries. After these initial queries, the training set is filled with the samples that are loated around the disriminant urves. This happens beause of the query strategy desribed in setion 3.4. The distribution of samples deviates from their true distribution. While SVM also uses the distane measure for queries, this problem is not serious for this algorithm. This is rooted in how SVM works. While quadrati lassifier uses all samples to lassify, SVM uses only samples around the boundary. Therefore, SVM is not as sensitive to the distribution of the queried samples in the training set. 5. CONCLUSIONS Human ativity reognition has broad appliations in medial researh and human survey system. In this projet, we designed a smartphone-based reognition system that reognizes five human ativities: walking, limping, jogging, going upstairs and going downstairs. The system olleted time series signals using a built-in aelerometer, generated 31 features in both time and frequeny domain, and then redued the feature dimensionality to improve the performane. The ativity data were trained and tested using 4 passive learning methods: quadrati lassifier, k-nearest neighbor algorithm, support vetor mahine, and artifiial neural networks. The best lassifiation rate in our experiment was 84.4%, whih is ahieved by SVM with features seleted by SFS. Classifiation performane is robust to the orientation and the position of smartphones. Besides, ative learning algorithms were studied to redue the expense of labeling data. Experiment results demonstrated the effetiveness of ative learning in saving labeling labor while ahieving omparable performane with passive learning. Among the four lassifiers, KNN and SVM improve most after applying ative learning. The results demonstrate that entropy and distane to the boundary are robust unertainty measures when performing queries on KNN and SVM respetively. Conlusively, SVM is the optimal hoie for our problem. Future work may onsider more ativities and implement a real-time system on smartphone. Other query strategies suh as variane redution and density-weighted methods may be investigated to enhane the performane of ative learning shemes proposed here. 6. REFERENCES [1] Morris, M., Lundell, J., Dishman, E., Needham, B.: New Perspetives on Ubiquitous Computing from Ethnographi Study of Elders with Cognitive Deline. In: Pro. Ubiomp (2003). [2] Lawton, M. P.: Aging and Performane of Home Tasks. Human Fators (1990) [3] Consolvo, S., Roessler, P., Shelton, B., LaMarha, A., Shilit, B., Bly, S.: Tehnology for Care Networks of Elders. In: Pro. IEEE Pervasive Computing Mobile and Ubiquitous Systems: Suessful Aging (2004). [4] http://www.isuppli.om/mems-and- Sensors/News/Pages/Motion-Sensor-Market-for- Smartphones-and-Tablets-Set-to-Double-by-2015.aspx. [5] http://www.omsore.om/press_events/press_releases /2011/11/omSore_Reports_September_2011_U.S._M obile_subsriber_market_share. [6] S.W Lee and K. Mase. Ativity and loation reognition using wearable sensors. IEEE Pervasive Computing, 1(3):24 32, 2002. [7] K. Kunze and P. Lukowiz. Dealing with sensor displaement in motion-based on body ativity reognition systems. Pro. 10th Int. Conf. on Ubiquitous omputing, Sep 2008 [8] T.B.Moeslund,A.Hilton,V.Kruger, A survey of advanes in vision-based human motion apture and analysis, Computer Vision Image Understanding 104 (2-3) (2006) 90 126 [9] R. Bodor, B. Jakson, and N. Papanikolopoulos. Visionbased human traking and ativity reognition. In Pro. of the 11th Mediterranean Conf. on Control and Automation, June 2003 [10] L. Bao and S. S. Intille, Ativity reognition from userannotated aeleration data, Pers Comput., Leture Notes in omputer Siene, vol. 3001, pp. 1 17, 2004.

[11] U. Maurer, A. Rowe, A. Smailagi, and D. Siewiorek, Loation and ativity reognition using ewath: A wearable sensor platform, Ambient Intell. Everday Life, Leture Notes in Computer Siene, vol. 3864, pp. 86 102, 2006. [12] J. Parkka, M. Ermes, P. Korpipaa, J. Mantyjarvi, J. Peltola, and I. Korhonen, Ativity lassifiation using realisti data from wearable sensors, IEEE Trans. Inf. Tehnol. Biomed., vol. 10, no. 1, pp. 119 128, Jan. 2006. [13] N.Wang, E. Ambikairajah,N.H. Lovell, and B.G. Celler, Aelerometry based lassifiation of walking patterns using time-frequeny analysis, in Pro. 29th Annu. Conf. IEEE Eng. Med. Biol. So., Lyon, Frane, 2007, pp. 4899 4902. [14] Y. Tao, H. Hu, H. Zhou, Integration of vision and inertial sensors for 3D arm motion traking in homebased rehabilitation, Int. J. Robotis Res. 26 (6) (2007) 607 624. [15] Preee S J, Goulermas J Y, Kenney L P J and Howard D 2008b A omparison of feature extration methods for the lassifiation of dynami ativities from aelerometer data IEEE Trans. Biomed. Eng. at press [16] E. K. Antonsson and R. W. Mann, The frequeny ontent of gait, J. Biomeh., vol. 18, no. 1, pp. 39 47, 1985 [17] Settles, B. (2010). Ative learning literature survey. Computer Sienes Tehnial Report 1648, University of Wisonsin-Madison. [18] K. Lang and E. Baum. Query learning an work poorly when a human orale is used. In Proeedings of the IEEE International Joint Conferene on Neural Networks, pages 335 340. IEEE Press, 1992. [19] X. Zhu. Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University, 2005a. [20] B. Settles, M. Craven, and L. Friedland. Ative learning with real annotation osts. In Proeedings of the NIPS Workshop on Cost-Sensitive Learning, pages 1 10, 2008a. [21] Available at http://en.wikipedia.org/wiki/artifiial_neural_network [22] S. Canu and Y. Grandvalet and V. Guigue and A. Rakotomamonjy "SVM and Kernel Methods Matlab Toolbox ", Pereption Systèmes et Information, INSA de Rouen, Rouen, Frane, 2005