Chapter 6 Signa Data Mining from Wearabe Systems Francois G. Meyer 6.1 Definition of the Subject 6.1.1 Introduction Sensors from wearabe systems can be anayzed in rea-time on-site, or can be transmitted to a centra hub to be anayzed off-ine. In both cases, the goa of the anaysis is to extract from the measurements information about the state of the user, and identify anomaous behavior to aert the person. The notion of state depends obviousy on the particuar appication, but in genera characterizes a high-eve function: the user is awake (as opposed to aseep), the user is faing, the user is going to have a heart attack, etc. Data anaysis reies on sophisticated statistica machine earning methods to earn from existing training exampes the association between sensors and high-eve states [1, 2]. The first stage of the anaysis consists in extracting meaningfu features, and in removing confounding artifacts. This stage can be achieved using various time-frequency or mutiscae methods. The second stage consists in reducing the dimensionaity of the data. Indeed, we know that it becomes extremey difficut to earn a function of the data, when the data are in very high dimension. Linear methods to reduce dimensionaity incude principa component anaysis (PCA) and independent component anaysis (ICA). Recenty, noninear methods, such as Lapacian eigenmaps, offer powerfu aternatives to traditiona inear methods. Finay, one is ready to earn a function of the measurements that describes the state of the person wearing the devices. The sensors ony provide very coarse and indirect measurements about the state of the user. For instance, one may be interested in cassifying the state of the user into norma states or anomaous states (the user fe aseep, or is going to have a heart attack). It is therefore necessary to earn the state of the user as a function of the measurements using statistica earning methods. F.G. Meyer (*) University of Coorado at Bouder, Bouder CO 80309, USA e-mai: fmeyer@coorado.edu A. Bonfigio and D. De Rossi (eds.), Wearabe Monitoring Systems, DOI 10.1007/978-1-4419-7384-9_6, # Springer Science+Business Media, LLC 2011 123
124 F.G. Meyer This chapter is organized as foows. The current section provides an overview of the type of data anaysis questions associated with wearabe systems. Section 6.2 contains a description of the various feature extraction techniques used in wearabe devices. The rea-time anaysis of data requires that an efficient dimension reduction be performed. Methods that can provide a faithfu representation of the data with much fewer parameters are described in Sect. 6.3. Finay, statistica machine earning methods that are used to characterize the state of the user are described in Sect. 6.4. A gossary of the terms used in this chapter can be found in Sect. 6.6. 6.1.2 Shape of the Data Sensor data can be one-dimensiona (e.g., acceerometer) or two-dimensiona (e.g., video). In this chapter, we wi mode sensor data as time series of scaars or vectors. Formay, a wearabe system can generate a vector X of measurements in R P (to simpify the notations and use the same formaism, we can think of an image as a vector propery reorganized). As time evoves, we can index each measurement with a time index, and we denote by x i, j the measurement that sensor j generates at times t i, i ¼ 1,..., N. Ceary, the tempora dimension pays a different roe than the sensor index j in this dataset, and methods of anaysis usuay take advantage of this distinction. 6.1.3 Scientific Questions The data generated by the sensors of a wearabe system can be used to answer questions about the state of an individua or the state of a popuation of individuas. 6.1.3.1 At the Leve of the Individua Wearabe computers can provide information about the environment surrounding the user, as we as information about the state (e.g., activity and heath monitoring) of the user. In both cases, the information is centered around the individua user and does not invove a network of wearabes. The surrounding, or context, can be characterized by the ocation of the user [3], the amount of noise [4], and the output of severa cameras [5]. The detection of the activity of the user [6, 7] is a process that is intrinsicay dynamic and requires rea-time computations. Finay, a wearabe system can be used to monitor the heath of the user [2, 8] and diagnose and detect anomaous events [9]. 6.1.3.2 At the Leve of a Group of Individuas Wearabe systems can aso be used to anayze socia interaction between different users and monitor socia networks [10].
6 Signa Data Mining from Wearabe Systems 125 6.1.4 Loca vs. Remote Anaysis The anaysis of the data coected by the sensors can either be performed ocay, using the imited computation and power resources avaiabe on the wearabe system, or be sent to a hub, where remote computations can be performed. 6.1.4.1 Loca or On-Site Anaysis This type of anaysis can provide an immediate feedback without requiring any communication to a centra hub. An on-site anaysis woud, for instance, be usefu for the continuous heath monitoring of individuas iving in remote areas without access to wireess networks [11]. Wearabe devices can be designed around microcontroers [12], DSP chips [13, 14, 15], or Fied Programmabe Gate Array (FPGA) [16, 17]. In a cases, the compexity of the agorithms that can be programmed is imited by the computationa and battery power avaiabe on the device. Such imitations may prevent the usage of cassification methods that require computationay expensive agorithms and massive amounts of training data. Finay, the degree of integration of the technoogy (handhed vs. wearabe) may further imit the amount of computationa power. 6.1.4.2 Remote Anaysis The on-site anaysis may be suppemented with, or repaced by, a remote anaysis. In this scenario, the data harvested by the sensors may be pre-processed on site to eiminate artifacts, and then sent to a centra computer, where a more compex anaysis is performed. Typicay, a wireess connection to a centraized computer aows the wearabe device to send the data to a centra hub, where the remote anaysis is performed. For instance, in the case of medica monitoring, medica data can be sent to a heath care center where diagnostic testing and monitoring are performed [18, 19, 2]. This type of processing makes it possibe to use machine earning agorithms that are computationay intensive and require arge amount of training data [20, 21]. The wireess connection can take advantage of wireess persona area networks standards such as the ZigBee specification: a suite of high eve communication protocos using sma, ow-power digita radios based on the IEEE 802.15.4-2006 standard [20]. 6.2 Feature Extraction The very arge size of the time series coected by wearabe sensors is a basic hurde to any attempt at anayzing the data. Consequenty, the anaysis of sensors is usuay performed on a smaer set of features extracted from the data [22]. The extraction of
126 F.G. Meyer features serves two purposes: first it reduces the dimensionaity by repacing the time-series with a more compact representation in a transformed domain (e.g., Fourier, or waveet); second, it separates the artifact and noise from the signa. 6.2.1 Time-Frequency Anaysis Many of the transforms that are used to extract features operate in the frequency domain. This can be justified by a theoretica argument: if the signas are reaizations of a wide sense stationary process, then the Fourier transform provides the optima representation [23]. In practice, many physioogica signas are osciatory and can be decomposed as a sum of a sma number of sinusoida functions. For instance, [11] use Fourier coefficients of motion sensors to study gait. However, in many appications the signas of interest are not stationary: they contain sudden changes and the oca statistica properties of the signas vary as a function of time. The abiity to detect sudden changes in the oca frequency content is essentia (e.g., prediction of seizures [24]). One aternative to the Fourier transform consists in dividing the time series into overapping segments within which the signa can be expanded using a Fourier transform. This oca Fourier anaysis requires windowing functions to isoate the different time segments. Formay, we consider a cover of the time axis S n¼þ1 n¼ 1 ½a n; a nþ1 Þ, and we write I n ¼½a n 1 ; a n Þ. The time intervas I n can be of fixed or adaptive sizes. To ocaize f around the interva of interest I n, one can construct a projection of f, P ½an ;a nþ1 Š f (see Fig. 6.1). The simpest exampe of P ½an ;a nþ1 Š f is obtained by mutipying f by a smooth window function r n, whose support is approximatey I n, P ½an ;a nþ1 Š : f! r n f : (6.1) We can then compute the Fourier transform of P ½an ;a nþ1 Š f using a fast Fourier transform. f P f a k a k+1 a k a k ε a k ε a k+1 ε ak+1 a k+1 ε Fig. 6.1 Locaization of a time interva before computing a Fourier transform
6 Signa Data Mining from Wearabe Systems 127 Features are then extracted by computing the energy (estimated from the magnitude squared of the Fourier transform) present in specific frequency bands. These frequency bands are determined from prior experiments, or from a priori physioogica knowedge. For instance, [25] use Fourier anaysis to extract reevant features from raw tremoracceeration data around a sma set offrequencies ofinterest(3 Hz, 4 Hz, 5 Hz, and 6 Hz). 6.2.1.1 Appication to Wearabe Systems The authors in [13] use a spectra representation of ECG and respiratory effort signas to estimate seepiness and distinguish seep from wake activity. The approach reies on the computation of the energy of the Fourier transform (spectrogram) over time intervas of ength 40 s. The computation of the Fourier transform can be performed using DSP chips or FPGA, and can be performed on site [26]. 6.2.1.2 Avaiabe Software The Time-Frequency Toobox, http://tftb.nongnu.org/, is a coection of MATLAB functions for the anaysis of nonstationary signas using time-frequency distributions. See aso WaveLab in the next section. 6.2.2 Mutiscae Anaysis Whie time-frequency methods can provide good energy compaction, they are not suited for anayzing phenomena that occur at very different scaes (from a second to a day). One of the main imitations of Fourier-based agorithms is their inabiity to expoit the mutiscae structure that most natura signas exhibit. As opposed to the short-time Fourier transform, the waveet transform performs a mutiscae, or Fig. 6.2 Fast waveet transform
128 F.G. Meyer mutiresoution, anaysis of the signa. The waveet transform is an orthonorma transform that provides a very efficient decorreation of many physica signas [23]. Exceent references in the mathematica theory of waveets and their appication to image compression incude [23, 27]. We describe briefy the fast waveet transform agorithm. We consider the time series x n n ¼ 1,..., N generated from one specific sensor from time t 1 unti t N. For simpicity, we assume that N ¼ 2 J. The foowing agorithm was discovered by [23], and is caed the fast waveet transform. The waveet transform of X at scae J is given by the vector of coefficients h i s 0 0 ; d0 0 ; d1 0 ; d1 1 ;...; dj 0 ;...dj ;...; 2 j J dj 1 N 1 0 ;...; d J 1 : (6.6) 2 1 N 1 The reconstruction formua is given by the foowing iteration: s jþ1 k ¼ X h k 2n sn j þ X g k 2n dn j : (6.7) n2z n2z The fast waveet transform has an overa compexity of O(N) operations. As shown in Fig. 6.3, the agorithm organizes itsef into a binary tree, where the shaded nodes of the tree represent the subspaces W j. One important parameter in the waveet anaysis concerns the choice of the fiters (h n, g n ). In genera, biorthogona waveet fiters with inear phase introduce ess distortion than orthonorma waveets. Whie onger fiters provide better out of band rejection with a sharp frequency cutoff, such fiters may become a computationa burden if a rea-time on-site appication is required. Finay, the number of vanishing moments of the waveet fiter contros the number of significant coefficients that one shoud expect when processing smooth signas. Indeed, poynomias of degree p 1 wi have a very sparse representation in a waveet basis with p vanishing moments: a the d k j are equa to zero, except for the coefficients ocated Fig. 6.3 Pyramida structure of the fast waveet transform
6 Signa Data Mining from Wearabe Systems 129 at the border of the dyadic subdivision ðk ¼ 0; 1; 2; 4; ; 2 J 1 Þ. Unfortunatey, waveet with p vanishing moments have at east 2p coefficients, thereby increasing the computationa oad. In a manner simiar to Fourier anaysis, one can decide a priori that certain frequency bands, which are associated with specific waveet coefficients, shoud beused for the subsequent anaysis of the data. Aternativey, one can remove sma coefficients, which are typicay generated by the noise, and keep ony the argest coefficients that come from the signa. In both cases, the seected waveet coefficients become the features representing the time series. 6.2.2.1 Appication to Wearabe Systems A waveet transform is used in [28, 29] to extract important features from ECG recordings. Simiary, data from inertia sensors are processed with fiter banks in [25] to quantify tremor frequency and energy. Finay, [26] use a waveet transform to remove the noise and smooth acceerometers time series. Waveet anaysis can be performed on site, since it can be impemented efficienty on a DSP chip or an FPGA. 6.2.2.2 Avaiabe Software WaveLab at Stanford University, http://www-stat.stanford.edu/~waveab/, isa very comprehensive ibrary of MATLAB software that impement waveet transforms, waveet packets, and other various time-frequency transforms. Waveet toobox in MATLAB. 6.3 Dimensionaity Reduction The extraction of features from the data effectivey repaces ong time series with shorter feature vectors (e.g., Fourier or waveet coefficients). Often, this dimension reduction is not significant, and one needs to further reduce the dimension of the feature vectors. Aternativey, there are cases where standard features are not avaiabe, and one needs to appy methods to reduce dimensionaity directy on the rawsensor measurements. Methods to reduce dimension expoit the intrinsic correations that exist in the feature vectors, or in the sensors measurements. Principa components anaysis finds the set of orthogona components that can best expain the variance in the observations, whie independent component anaysis can decompose the observations into components that are statisticay independent. Finay, Lapacian eigenmaps can provide a faithfu parametrization of the sensor data when the data organize themseves in a noninear manner.
130 F.G. Meyer 6.3.1 Principa Component Anaysis Instead of using a fixed transform, such as the Fourier or the waveet transform, one can often construct a sparser representation by adaptivey computing an optima transform. Principa component anaysis is one such adaptive transform. We consider a wearabe system equipped with severa sensors. Let x i, j be the measurement that sensor j generates at times t i, i ¼ 1,..., N. We can organize the sensor measurements as an N p matrix 2 3 x 1;1... x 1;p x 2;1... x 2;p X ¼ 6... 7 4 5 : (6.8) x N;1... x N;p We can think of the measurement generated by the sensors at time t i as a vector X i ¼ [x i,1,..., x i, p ] T in R p, and we can write the matrix X by stacking horizontay the time series of sensor vectors, 2 3 X T 1 X ¼. 6. 7 : (6.9) 4 5 X T N As time evoves, the vectors X 1, X 2,...X N form a trajectory in R p (see Fig. 6.4). If the tempora samping of the sensor is sufficienty fast, we expect that the discrete trajectory wi be smooth, and that the points X i wi be highy correated. The goa of principa component anaysis (PCA) is to compute a ow dimensiona affine approximation to the set X 1, X 2...X N. First, the set of measurements is centered around the center of mass X ¼ 1 N Then the optima subspace of dimension r is computed from the singuar vaue decomposition (SVD) [30] of the centered matrix X N i¼1 X i : X 1 X i X 2 Fig. 6.4 Trajectory of the sensor vector X i as a function of time
6 Signa Data Mining from Wearabe Systems 131 2 X T 1 3 XT. 6 7 4 5 X T N XT ¼ Xp i¼1 s i U i V T i ; (6.10) where the vectors U 1,..., U p (respectivey V 1,..., V p ) are mutuay orthogona [30]. In summary, the optima affine approximation of rank r is given by X þ Xr k¼1 s k U k V T k : (6.11) This ow dimensiona approximation is guaranteed to maximize the dispersion (variance) of the projected points on the subspace formed by the vectors U 1,..., U r. Geometricay, the vectors U i wi be aigned with the successive orthogona directions aong which the data changes the most (see Fig. 6.5). From a inear agebra perspective, the affine mode (6.7) resuts in the minimum approximation error in the mean-squared sense among a affine modes of rank r. Remark 1. The eigenvaues s i determine the shape of the distribution of sensor measurements. If a the s i are equa, then a vectors U i are equivaent. This is a degenerate case where PCA offers no gain. If the s i are very different, the distribution is shaped as an eipsoid. PCA provides a gain by aigning the coordinate axes with the main axes of the eipsoid (see Fig. 6.5). Remark 2. If we incude a the p components, then we have an exact decomposition of the measurements, X ¼ X þ Xp k¼1 s k U k V T k ¼ X þ USVT ; (6.12) where U ¼ [U 1,..., U p ], V ¼ [V 1,..., V p ] and S ¼ diag(s 1,, s p ). Remark 3. PCA can aso be used to whiten the data; a step which is often required before further anaysis [14]. Fig. 6.5 Principa components U 1, U 2 and U 3
132 F.G. Meyer 6.3.1.1 Appication to Wearabe Systems PCA cannot be impemented on a DSP chip or an FPGA, but can be impemented on a sma aptop, as described in [4], where a rea-time anaysis of the sensors using a time-varying PCA is computed. A custering agorithm combined with seforganizing maps is aso used to identify the main sensor custers. The custers are then updated in rea-time. In [21], the authors combine feature extraction and PCA to anayze ECG recordings. 6.3.1.2 Avaiabe Software The main ingredient of the PCA agorithm is the SVD decomposition of the matrix X. There exist severa impementations of SVD in Fortran, and in MATLAB. 6.3.2 Independent Component Anaysis Independent component anaysis (ICA) seeks to decompose the data into a inear combination of statisticay independent components [31]. The method assumes the foowing mixing mode, where the vector of measurements X generated by the sensors at any given time can be decomposed in terms of p independent scaar sources 2 3 2 32 3 x 1 6. X ¼ 4. x p or equivaenty in matrix form 7 5 ¼ 6 4 a 1;1 a 1;p s 1.. 76 54.. a p;1 a p;p s p 7 5; (6.13) X ¼ AS: (6.14) Because the variance of the sources s 1,.., s p cannot be estimated, we can assume that the sources are decorreated (their covariance matrix is the identity) and the matrix A is orthogona. A numerica soution to the estimation of the sources and the mixing matrix A can be obtained by maximizing the departure from Gaussianity of the vector S ¼ A 1 X ¼ A T X. This can be achieved by maximizing the negentropy of each coordinate s i of S [32]. It has been noted [31] that in practice, if the observations are noisy, then it becomes impossibe to separate the components from the noise. In fact, if the anaysis is performed on rea data, then the components are not even approximatey independent [31]. Finay, a common probem associated with the usage of ICA is the interpretation of the components. The interpretation usuay reies on post hoc heuristics, such as visua inspection of the source time-series.
6 Signa Data Mining from Wearabe Systems 133 6.3.2.1 Appication to Wearabe Systems ICA suffers from the same computationa oad as PCA, and cannot be easiy impemented on sma portabe devices at the moment. Motion artifacts were identified in puse oximeter signas using ICA [33]. But see [34] for a discussion of the vaidity of the assumption of independence of the motion artifact signas and the signa generated by arteria voume variations. 6.3.2.2 Avaiabe Software The impementation of Be and Sejnowski can be found at http://www.cn.sak. edu/~tony/ica.htm. A fast ICA MATLAB package is avaiabe from the Laboratory of Computer and Information Science (CIS) at the Hesinki University of Technoogy, http:// www.cis.hut.fi/projects/ica/fastica/ 6.3.3 Lapacian Eigenmaps The methods of reduction of dimensionaity described in the previous sections are inear: each vector of sensor data X i is projected onto a set of components U k. The resuting coefficients serve as the new coordinates in the ow dimensiona representation. However, in the presence of noninearity in the organization of the X i in R p (see Fig. 6.4), a inear mapping may distort oca distances. These distortions wi make the anaysis of the dataset more difficut. We describe in this section a method to construct a noninear map C to represent the dataset X in ow dimensions. Because the map C is abe to preserve the oca couping between sensors vectors, ow dimensiona coherent structures can easiy be detected with a custering or cassification agorithm. We describe a ow dimensiona embedding of the set of sensor measurements X 1,..., X N into R m, where m p. The embedding is constructed with the eigenfunctions of the graph Lapacian [35]. First, we represent the measurements by a graph that is constructed as foows. Each sensor vector X i becomes the node (or vertex) i of the graph. Edges between vertices quantify the simiarity of sensor vectors. Each node i is connected to its n n nearest neighbors according to the Eucidean distance kx i X j k. Finay, a weight W i, j on the edge {i, j}isdefinedasfoows, W i;j ¼ e kx i X j k 2 =s 2 ; if i is connected to j, 0 otherwise. (6.15) The weighted graph G is fuy characterized by the N 2 N 2 weight matrix W with entries W i, j. Let D be the diagona degree matrix with entries d i ¼ j W i, j.
134 F.G. Meyer The map C is designed to preserve short-range (oca) distances, as measured by W i, j. The map is constructed one coordinate at a time. Each coordinate function c k is the soution to the foowing minimization probem P 2 X min i X j W i;j c k ðx i Þ c k ðx j Þ ck P i D i;ic 2 k ðx ; (6.16) iþ where c k is orthogona to the previous functions {c 0, c 1,, C k 1 }, hc k ; c j i¼ X N D i¼1 i;ic k ðx i ÞC j ðx i Þ¼0 ðj ¼ 1;...; k 1Þ: (6.17) The numerator of the Rayeigh ratio (6.12) is a weighted sum of the gradient of c k measured aong the edges {i, j} of the graph; it quantifies the average distortion introduced by the map C k. The denominator provides a natura normaization. The constraint of orthogonaity to the previous coordinate functions (6.17) guarantees that guarantees that the coordinate c k describes the dataset with a finer resoution: C k osciates faster on the dataset than the previous C j if hc k, C j i¼0. Intuitivey, C k pays the roe of an additiona digit that describes the ocation of X i with more precision. It turnsout [35] that the soution of (6.12 and 6.13) is the soution to the generaized eigenvaue probem, ðd WÞc k ¼ k Dc k ; k ¼ 0;... (6.18) The first eigenvector c 0, associated with 0 ¼ 0, is constant, C 0 ðx i Þ¼1; i ¼ 1;...; N; it is therefore not used. Finay, the new parametrization C is defined by X i 7!CðX i Þ¼½c 1 ðx i Þ c 2 ðx i Þ...c m ðx i Þ Š T : (6.19) The idea of parametrizing a manifod using the eigenfunction of the Lapacian was first proposed in [36]. Recenty, the same idea has been revisited in the machine-earning iterature [37, 38]. The construction of the parametrization is summarized in Fig. 6.6. Unike PCA, which yieds a set of vectors on which to project each X i, this noninear parametrization constructs the new coordinates of X i X i by concatenating the vaues of the c k, k ¼ 1,, mk ¼ 1,, m evauated at X i,asdefinedin(6.15). The embedding obtained with the Lapacian eigenmaps is in fact very simiar to a parametrization of the dataset with a kerneized version of PCA, known as kerne-pca [39]. 6.3.3.1 Appication to Wearabe Systems The authors in [40] embed mutidimensiona sensor data using Lapacian eigenmaps and custer the dataset using the new coordinates. The anaysis is not performed ocay, since the Lapacian eigenmaps require computationay intensive agorithms: nearest neighbor search and eigenvaue probems.
6 Signa Data Mining from Wearabe Systems 135 Fig. 6.6 Construction of the embedding 6.3.3.2 Avaiabe Software The origina MATLAB code of Beikin is avaiabe here:http://manifod.cs. uchicago.edu/ A suite of cassification agorithms based on Lapacian eigenmaps is avaiabe here: http://manifod.cs.uchicago.edu/manifod_reguarization/software.htm A method reated to Lapacian eigenmpas and known as diffusion geometry is avaiabe here: http://www.math.duke.edu/~mauro/code.htm Finay, the Statistica Learning Toobox of Dahua Lin avaiabe here:http:// www.mathworks.com/matabcentra/fieexchange/12333 contains code for the Lapacian eigenmaps. 6.4 Cassification, Learning of States, and Detection of Anomaies We are now concerned with the fina stage of the anaysis: the extraction of high-eve information from the wearabe system. This abstract information can be an aarm in the case of a natura catastrophe [41], or a diagnostic for a subject with a risk of heart disease [24]. We assume that each sensor vector is represented by a p-dimensiona vector X 2 R p. This vector may be composed of the raw measurements of the sensors, or may be a vector of features (see Sect. 6.2), or the outcome of a dimensionaity reduction agorithm (see Sect. 6.3). At each time t i, we have access to the sensor vector X i (the subscript i is a time index). The question becomes: what is the state y i that characterizes the user at time t i given the sensor vector X i? Depending on the appication, the state y i coud, for instance, encode the presence of an aarm, or the ikeihood of a heart attack. There are two broad types of approaches to answer this question. The first approach assumes that there exists a arge coection of training data composed of sensors vaues X 1,..., X that have been carefuy abeed with the
136 F.G. Meyer corresponding state y 1,..., y of the user. The abeing is a very time-consuming process that needs to be performed off-ine by the user himsef/hersef, or by an expert (e.g., in the case of biomedica data). Machine-earning agorithms can use the training data to earn the association between the state y i and the sensor measurement X i. Support vector machines is an important exampe of supervised cassification methods; it is discussed in Sect. 6.4.2. This supervised approach may require significant amounts of training data to achieve good performances. An aternative approach consists in etting the data speak for themseves using custering methods that identify simiarities in the sensor vectors, and group the measurements into coherent custers. Such methods are caed unsupervised and do not require any training data. Whie they may not provide an answer to our question, namey, what is the state y i?, unsupervised methods can rapidy organize the data into coherent states that can then be further anayzed. We described in the next section unsupervised methods. 6.4.1 Unsupervised Methods 6.4.1.1 K-Means Custering The K-means custering agorithm is a method to divide a set of vectors into homogeneous groups, within which vectors are simiar to one another. The goa is to find the optima number of custers, K, the optima custer centroids, C 1,..., C K, and the optima assignments of each vector X i to a custer k to minimize the tota within custer scatter [30] X K N k X k¼1 X i 2custer k kx i C k k 2 ; (6.20) where N k is the tota number of vectors assigned to custer k. Indeed, the term X kx i C k k 2 ; X i 2custer k quantifies the scatter of the vectors in custer k around the centroid C k. Therefore, (6.20) quantifies the tota amount of scatter within a custers. Given a specified number of custers, K, agorithm 3 (see Fig. 6.7) iterativey partitions the data, and computes the centroids C 1,..., C K of a the custers (see Fig. 6.8). The output of the agorithm depends on K, which can be further optimized [30], and depends aso on the initia assignment of vectors to custers. Whie the agorithm wi eventuay converge, there is no guarantee that it reaches the goba minimum of (6.16). One shoud therefore repeat the agorithm of Fig. 6.7 with severa different initia conditions, and retain the soution that minimizes (6.20).
6 Signa Data Mining from Wearabe Systems 137 Fig. 6.7 K-means custering Fig. 6.8 K-means custering of the dataset, with K ¼ 2. The custer centroids are circed in back. The true boundary is shown as a dashed ined X 1 X 2 X i Fig. 6.9 Mixture of Gaussian mode fitted to the dataset, with K ¼ 2. The mean of each Gaussian distribution is circed in back. The true boundary is shown as a dashed ined The assignment of a sensor vector to a custer resuts in a hard decision. A soft version of the same idea is provided by the mixture of Gaussian densities mode discussed in the next section (Fig. 6.9). 6.4.1.2 Mixture of Gaussian Densities The Gaussian mixture mode is a generative probabiistic mode that assumes that the joint distribution of the vector of sensor measurements X is a finite mixture of mutivariate Gaussian densities,
138 F.G. Meyer PðXÞ ¼ XK k¼1 p k fðx; m k ; S k Þ: (6.21) The mixing parameters p k are positive weights that add up to 1. The density f is the p-mutivariate norma density function. The maximum ikeihood estimates ^m k, ^S k and ^p k of the mixture parameters can be computed from the measurements using the expectation minimization (EM) agorithm [42]. The estimation of the number of components K (which pays the same roe as the number of custers) is often difficut [42] but can be addressed using mode seection criteria [43]. We can use the estimates ^m k, ^S k, and ^p k to compute the posterior probabiity given by PðkjXÞ ¼ P K ¼1 ^p kfðx; ^m k ; ^S k Þ ^p fðx; ^m ; ^S Þ; (6.22) which provides an estimate of the probabiity that measurement X be generated by component k. 6.4.1.3 Appication to Wearabe Systems Mixture of Gaussians have been used in [44] to segment sensor time series into intervas associated with distinct activities. We note that custering agorithms can be impemented on a persona digita assistant (PDA) [45], where a custering agorithm earns to identify the context associated with the usage of the PDA. 6.4.1.4 Avaiabe Software The open source custering software contains custering routines such as k-means and k-medians, and is avaiabe at: http://bonsai.ims.u-tokyo.ac.jp/ ~mdehoon/software/custer/ PRToos is a Matab based toobox for pattern recognition, and is freey avaiabe at http://www.prtoos.org/ 6.4.2 Support Vector Machine Among a cassification techniques that have been used to anayze sensor data from wearabe systems, support vector machines (SVMs) is one of the most popuar methods [46, 47]. SVM can construct a noninear separating boundary between two casses by impicity mapping the features to a high-dimensiona space, and performing a inear cassification in that space. Our discussion of SVM foows [30].
6 Signa Data Mining from Wearabe Systems 139 We consider the probem of cassifying sensor vectors into two casses defined by the abes y ¼ 1 for cass 0 and y ¼ 1 for cass 1. Extension to more than two casses can be easiy obtained by using a one-versus-a strategy, where each cass is compared to the other remaining casses, and X is assigned to the cass that is most often seected by the different cassifications. 6.4.2.1 Support Vector Cassifier Given a training set composed of sensor data X 1,..., X and the associated cass abes, y 1,..., y, our goa is to construct a cassifier f(x)thatassignsaabe 1ifX beongs to cass 0, and 1 if X beongs to cass 1. We assume that the training sampes are ineary separabe: one can find a hyperpane that divides the training dataset according to the cass membership. The cassifier is defined as foows, f ðxþ ¼ 1 if hw; Xiþb<0 1 if hw; Xiþb>0: (6.23) where W 2 R p, b 2 R and hw; Xi ¼ P p i¼1 x iw i is the inner product between the vectors X and W. The hyperpane (defined by W and b) that optimay separates the two casses can be found by maximizing the margin between the two casses for the training data (see Fig. 6.10). Specificay, we choose W and b such that 2 kwk ; the margin width, is maximized, and for a the vectors X i in the training data we have, y i ðhw; X i iþbþ 1: This quadratic programming optimization probem can be soved using Lagrange mutipiers and resuts in the foowing cassifier f ðxþ ¼sign X i! a i y i hx i ; Xiþb ; (6.24) where W ¼ X i a i y i X i ; (6.25) and the Lagrange mutipiers a i are nonzero ony for those vectors that ie at the boundary of the margin (see Fig. 6.10). Such training vectors are caed the support vectors. As shown in Fig. 6.10, the width of the margin is constrained by the support vectors. The support vectors a satisfy y i ðhw; X i iþbþ ¼ 1: (6.26)
140 F.G. Meyer Fig. 6.10 Maximum margin inear cassifier Finay, the offset b can be computed from any such support vector X i, b ¼ y i hw; X i i: (6.27) If the two casses cannot be ineary separated, it is sti possibe to appy the same cassification method with the introduction of additiona sack variabes. These variabes aow for some training vectors to be on the wrong side of the separating hyperpane [30]. 6.4.2.2 Support Vector Machines The inear cassifier can be extended to a noninear cassifier, where the separating boundary between the two casses is no onger a hyperpane, but can be a surface of arbitrary geometry. Rather than defining the surface in the origina domain R p, the vectors are impicity mapped to a higher dimensiona space (see Fig. 6.11), where distances and inner products are measured using a kerne K, different from the usua inner product [30]. The cassifier becomes f ðxþ ¼sign X i! a i y i KðX i ; XÞþb : (6.28) d Popuar choices for K incude the poynomia kerne KðX i ; X j Þ¼ hx i ; X j iþ1 and the Gaussian kerne KðX i ; X j Þ¼expð kx i X j k 2 =s 2 Þ. 6.4.2.3 Appication to Wearabe Systems Dinh et a. [20] use SVM to cassify various sensors measurements to detect near-fa events in oder aduts. In [46] the severity of tremor in patients with Parkinson
6 Signa Data Mining from Wearabe Systems 141 X 1 X 2 Xi Fig. 6.11 Cassification with support vector machines: the origina dataset is ifted up into a highdimensiona space, where a maxima margin inear cassifier can separate the two casses disease is quantified using a muticass (one-versus-a) cassifier based on SVM. The sensors provide acceerometer data. The computationa compexity of the SVM agorithm requires that the training and the cassification be performed remotey on a centra computer. 6.4.2.4 Avaiabe Software Many software packages provide impementations of the SVM. Some of the main pubic domain packages are isted beow. SVM ight: http://svmight.joachims.org LIB SVM: http://www.csie.ntu.edu.tw/~cjin/ibsvm Additiona impementations can be found at: http://www.support-vectormachines.org/svm_soft.htm 6.4.3 Semi-Supervised An aternative to supervised earning and unsupervised earning is semi-supervised earning [48]. Semi-supervised can take advantage of unabeed sensor vectors to earn the organization of the sensor data in R p. Most semi-supervised earning agorithm assume that the data organize themseves smoothy, and that the geometry of the data wi hep construct the cassifier. In the case of sensors from a wearabe system, it is
142 F.G. Meyer possibe to assume that the data from a sensor ie cose to a ow dimensiona manifod (see Sect. 6.3.3). In this case, it becomes possibe to use partiay abeed data to discover the geometry of the manifod, whie at the same time constructing the cassifier. In the context of wearabe sensors, this approach is very appeaing since it does not require to abe a arge amount of data. 6.4.3.1 Appication to Wearabe Systems Ai et a. [49] combine a PCA decomposition with a semi-supervised earning approach to construct a cassifier that can recognize activities. Simiary, Mahdaviani and Choudhury [50] and Stikic and Schiee [7] propose an activity recognition method that ony requires very few abeed data. As with SVM, the method can ony be appied remotey on a computer with enough computationa power. 6.4.3.2 Avaiabe Software The group of Partha Niyogi at the University of Chicago has deveoped some MATLAB software to perform semi-supervised cassification using manifod reguarization. The software is avaiabe at: http://manifod.cs.uchicago.edu/ manifod_reguarization/manifod.htm 6.5 Concusion and Future Directions We have reviewed in this chapter some of the dimension reduction and statistica machine-earning methods to mine and extract information from wearabe systems. Many of these techniques have been borrowed from the existing statistica and machine-earning iterature. In comparison with genera data anaysis probems, the on-site, or oca anaysis of wearabe data has a number of we-defined constraints: imited computationa power and imited bandwidth. These constraints require efficient and fast agorithms for on-ine anaysis or off-ine transmission to a centra hub. As the presence of wearabes becomes ubiquitous, and as the sensors become more integrated, we expect that there wi be a need for more efficient data anaysis agorithms. Increasing the speed of current agorithms is ceary not the answer. Rather, we expect to see that competey new ideas wi be required to tacke the amount of data generated by wearabe systems. For instance, it may come as a surprise that ony reading randomy a very sma subset of the sensor vaues (the principe underpinning compressive samping [51]) yieds the same accuracy as uniform samping [52], with an enormous saving in power consumption. Other directions incude the deveopment of taiored statistica modes [53] that can be estimated with fewer sensors and ess computations than generic probabiistic modes. Ceary, the area of data anaysis for wearabe systems promises to be exciting and chaenging.
6 Signa Data Mining from Wearabe Systems 143 6.6 Gossary Independent component anaysis (ICA) A inear decomposition of the data into statisticay independent sources. The sources and the mixing weights are estimated by the agorithm. Karhunen-Loève transform See Principa component anaysis. Kerne PCA See Lapacian Eigenmaps. Lapacian eigenmaps A noninear method to parametrize a dataset using the eigenvectors of the graph Lapacian defined on the dataset. The method optimay preserves the short-range distance whie assembing a goba parametrization of the data. Mutiscae anaysis A method to decompose a dataset into signas that have we defined characteristic scaes. An exampe is provided by a waveet anaysis. Principa component anaysis (PCA) A decomposition of a dataset into inear components that best capture the variance in the data. Semi-supervised earning A cassification method that combines unabeed data with abe data to construct a cassifier. The method expoits the underying smoothness (e.g., the data ie on a manifod) of the data to estimate the geometry of the data and compensate for the ack of abes. Short time Fourier transform See time-frequency anaysis. Singuar vaue decomposition See principa component anaysis. Support vector machine (SVM) A cassification agorithm that combines a maximum margin cassifier with a measure of simiarity using a kerne. Time-frequency anaysis A decomposition of a signa in terms of ocaized osciatory patterns with we defined frequency and position. Waveet anaysis See mutiscae anaysis. References 1. Ahoa T, Korpinen P, Rakkoa J, Ramo T, Saminen J, Savoainen J (2007) Wearabe fpga based wireess sensor patform. In: Engineering in Medicine and Bioogy Society, 2007. 29th Annua Internationa Conference of the IEEE EMBS 2007, pp 2288 2291 2. Ai A, King R, Yang G (2008) Semi-supervised segmentation for activity recognition with Mutipe Eigenspaces. In: Medica Devices and Biosensors, 2008. 5th Internationa Summer Schoo and Symposium on ISSS-MDBS 2008, pp 314 317 3. Ai R, Ataah L, Lo B, Yang G (2009) Transitiona activity recognition with manifod embedding. In: Proceedings of the 2009 Sixth Internationa Workshop on Wearabe and Impantabe Body Sensor Networks, 1, pp 98 102 4. Ataah L, Lo B, Ai R, King R, Yang G (2009) Rea-time Activity Cassification Using Ambient and Wearabe Sensors. IEEE Transactions on Information Technoogy in Biomedicine 5. Baheti PK, Garudadri H (2009) An utra ow power puse oximeter sensor based on compressed sensing. In: Proceedings 2009 6th Internationa Workshop on Wearabe and Impantabe Body Sensor Networks, BSN 2009, pp 144 148
144 F.G. Meyer 6. Bekin M, Niyogi P (2003) Lapacian eigenmaps for dimensionaity reduction and data representation. Neura Computations 15:1373 1396 7. Bérard P, Besson G, Gaot S (1994) Embeddings Riemannian manifods by their heat kerne. Geomet Funct Ana 4(4):373 398 8. Bonato P, Mork P, Sherri D, Westgaard R (2003) Data mining of motor patterns recorded with wearabe technoogy. IEEE engineering in medicine and bioogy magazine 22(3):110 119 9. Bonfigio A, Carbonaro N, Chuze C, Curone D, Dudnik G, Germagnoi F, Hathera D, Koer J, Lanier T, Loriga G et a (2007) Managing catastrophic events by wearabe mobie systems. In: Mobie Response, vo 4458/2007. Springer, Berin, pp 95 105 10. Candes EJ, Wakin MB (2008) An introduction to compressive samping: A sensing/samping paradigm that goes against the common knowedge in data acquisition. IEEE Signa Process Mag 25(2):21 30 11. Chapee O, Schökopf B, Zien A (eds) (2006) Semi-supervised earning. MIT, MA 12. Chung F (1997) Spectra Graph Theory, 92(92). American Mathematica Society 13. Coifman R, Lafon S (2006) Diffusion maps. App Comput Harmonic Ana 21:5 30 14. Davrondzhon G, Einar S (2009) Gait Recognition Using Wearabe Motion Recording Sensors. EURASIP Journa on Advances in Signa Processing pp 1 16 15. Dinh A, Shi Y, Teng D, Rahan A, Chen L, Da Beo-Haas V, Basran J, Ko S, McCrowsky C (2009) A fa and near-fa assessment and evauation system. Open Biomed Eng J 3:1 16. Eage N, Pentand A (2006) Reaity mining: sensing compex socia systems. Persona and Ubiquitous Computing 10(4):255 268 17. Fanagan J (2005) Unsupervised custering of context data and earning user requirements for a mobie device. Lecture Notes in Computer Science, vo 3554, pp 155 168 18. Giansanti D, Maccioni G, Cesinaro S, Benvenuti F, Maceari V (2008) Assessment of farisk by means of a neura network based on parameters assessed by a wearabe device during posturography. Medica Engineering and Physics 30(3):367 372 19. Garos C, Fotiadis D (2005) Wearabe Devices in Heathcare. Studies in Fuzziness and Soft Computing 184:237 20. Han D, Park S, Lee M (2008) THE-MUSS: Mobie u-heath service system. In: Biomedica Engineering Systems and Technoogies, vo 25. Springer, Berin, pp 377 389 21. Hastie T, Tibshinari R, Freedman J (2009) The eements of statistica earning. Springer, Berin 22. Hu F, Jiang M, Xiao Y (2008) Low-cost wireess sensor networks for remote cardiac patients monitoring appications. Wireess Comm Mobie Comput 8(4):513 530 23. Huynh T, Banke U, Schiee B (2007) Scaabe recognition of daiy activities with wearabe sensors. In: Location-and context-awareness: Third internationa symposium, LoCA 2007, Oberpfaffenhofen, Germany, September 20 21. 2007 Proceedings, p 50 24. Hyv arinen A (1999) Survey on independent component anaysis. Neura Comput Surv 2:94 128 25. Hyv arinen A, Oja E (2000) Independent component anaysis: Agorithms and appications. Neura Network 13(4 5):411 430 26. Jaffard S, Meyer Y, Ryan R (2001) Waveets: toos for science & technoogy. Society for Industria and Appied Mathematics 27. Karen W, Mattiussi C, Foreano D (2009) Seep and wake cassification with ECG and respiratory effort signas. IEEE Trans Biomed Circ Syst 3(2):71 78 28. Ko L, Tsai I, Yang F, Chung J, Lu S, Jung T, Lin C (2009) Rea-time embedded EEG-based brain-computer interface. In: Advances in neuro-information processing. Springer, Berin, pp 1038 1045 29. Krause A, Smaiagic A, Siewiorek D (2006) Context-aware mobie computing: Learning context-dependent persona preferences from a wearabe sensor array. IEEE Transactions on Mobie Computing pp 113 127
6 Signa Data Mining from Wearabe Systems 145 30. Mahdaviani M, Choudhury T (2008) Fast and scaabe training of semi-supervised crfs with appication to activity recognition. In: Patt J, Koer D, Singer Y, Roweis S (eds) Advances in Neura Information Processing Systems 20,MIT Press, Cambridge, MA, pp 977 984 31. Maat S (1999) A waveet tour of signa processing. Academic, NY 32. Matsuyama T (2007) Ubiquitous and wearabe vision systems. In: Imaging beyond the pinhoe camera, Springer, pp 307 330 33. McLachan G, Krishnan T (1997) The EM agorithm and extensions. Wiey, NY 34. Minnen D, Starner T, Essa M, Isbe C (2006) Discovering characteristic actions from on-body sensor data. In: 10th IEEE Internationa Symposium on Wearabe computers 2006, pp 11 18 35. Panteopouos A, Bourbakis N (2010) Design of the new prognosis wearabe system-prototype for heath monitoring of peope at risk. In: Advances in biomedica sensing, measurements, instrumentation and systems. Springer, Berin, pp 29 42 36. Paoetti M, Marchesi C (2006) Discovering dangerous patterns in ong-term ambuatory ECG recordings using a fast QRS detection agorithm and exporative data anaysis. Comput Meth Programs Biomed 82(1):20 30 37. Pate S, Lorincz K, Hughes R, Huggins N, Growdon J, Standaert D, Akay M, Dy J, Wesh M, Bonato P (2009) Monitoring motor fuctuations in patients with Parkinson s disease using wearabe sensors. IEEE Trans Inform Techno Biomed 13(6):864 38. Poh MZ, Kim K, Goessing AD, Swenson NC, Picard RW (2009) Heartphones: Sensor earphones and mobie appication for non-obtrusive heath monitoring. Wearabe Computers, IEEE Internationa Symposium pp 153 154 39. Powe Jr H, Hanson M, Lach J (2009) On-body inertia sensing and signa processing for cinica assessment of tremor. IEEE Trans Biomed Circ Syst 3(2):108 116 40. Preece S, Gouermas J, Kenney L, Howard D (2009) A comparison of feature extraction methods for the cassification of dynamic activities from acceerometer data. Biomedica Engineering, IEEE Transactions on 56(3):871 879 41. Schökopf B, Smoa A, M uer K (1999) Kerne principa component anaysis. In: Advances in kerne methods: Support vector earning. MIT, MA 42. Stetson P (2004) Independent component anaysis of puse oximetry signas based on derivative skew. Lecture notes in computer science pp 1072 1078 43. Stikic M, Schiee B (2009) Activity recognition from sparsey abeed data using muti-instance earning. In: Proceedings of the 4th Internationa Symposium on Location and Context Awareness, Springer, p 173 44. Subramanya A, Raj A, Bimes J, Fox D (2006) Recognizing activities and spatia context using wearabe sensors. In: Proc. of the Conference on Uncertainty in Artificia Inteigence 45. Sun Z, Mao X, Tian W, Zhang X (2009) Activity cassification and dead reckoning for pedestrian navigation with wearabe sensors. Meas Sci Techno 20:015203 46. Tanner S, Stein C, Graves S (2009) On-board Data Mining. Scientific Data Mining and Knowedge Discovery pp 345 376 47. Tsai D, Morey J, Suaning G, Love N (2009) A wearabe rea-time image processor for a vision prosthesis. Comput Meth Program Biomed 95(3):258 269 48. Verbeek J, Vassis N, Krose B (2003) Efficient greedy earning of Gaussian mixtures. Neura Comput 15:469 485 49. Viswanathan M (2007) Distributed data mining in a ubiquitous heathcare framework. In: Proceedings of the 20th conference of the Canadian Society for Computationa Studies of Inteigence on Advances in Artificia Inteigence. Springer, Berin, p 271 50. Yao J, Warren S (2005a) A short study to assess the potentia of independent component anaysis for motion artifact separation in wearabe puse oximeter signas. In: 27th Annua Internationa Conference of the Engineering in Medicine and Bioogy Society, IEEE-EMBS, pp 3585 3588 51. Yao J, Warren S (2005b) Appying the ISO/IEEE 11073 standards to wearabe home heath monitoring systems. J Cin Monit Comput 19(6):427 436
146 F.G. Meyer 52. Zinnen A, Banke U, Schiee B (2009) An anaysis of sensor-oriented vs. modebased activity recognition. In: IEEE Internationa Symposium onwearabe Computers, pp 93 100 53. Zhang F, Lian Y (2009) QRS Detection Based on Mutiscae Mathematica Morphoogy for Wearabe ECG Devices in Body Area Networks. IEEE Transactions on Biomedica Circuits and Systems 3(4):220 228