Adaptive Classification by Hybrid EKF with Truncated Filtering: Brain Computer Interfacing



Similar documents
Christfried Webers. Canberra February June 2015

STA 4273H: Statistical Machine Learning

11. Time series and dynamic linear models

Linear Classification. Volker Tresp Summer 2015

Statistical Machine Learning

Bayes and Naïve Bayes. cs534-machine Learning

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Basics of Statistical Machine Learning

Time Series Analysis

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Gaussian Processes in Machine Learning

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Least Squares Estimation

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistics Graduate Courses

D-optimal plans in observational studies

Monte Carlo-based statistical methods (MASM11/FMS091)

Note on the EM Algorithm in Linear Regression Model

A Study on the Comparison of Electricity Forecasting Models: Korea and China

Reject Inference in Credit Scoring. Jie-Men Mok

A Multi-Model Filter for Mobile Terminal Location Tracking

Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA , USA kbell@gmu.edu, hlv@gmu.

Handling attrition and non-response in longitudinal data

Bayesian Image Super-Resolution

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Electrophysiology of the expectancy process: Processing the CNV potential

6.2.8 Neural networks for data mining

Filtered Gaussian Processes for Learning with Large Data-Sets

Predict the Popularity of YouTube Videos Using Early View Data

Multiplier-accelerator Models on Time Scales

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

HT2015: SC4 Statistical Data Mining and Machine Learning

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Discrete Frobenius-Perron Tracking

BayesX - Software for Bayesian Inference in Structured Additive Regression

Cell Phone based Activity Detection using Markov Logic Network

Stock Option Pricing Using Bayes Filters

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Brain Computer Interface Research at the Wadsworth Center

Performance evaluation of multi-camera visual tracking

Classification Problems

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Advanced Signal Processing and Digital Noise Reduction

Cortical Source Localization of Human Scalp EEG. Kaushik Majumdar Indian Statistical Institute Bangalore Center

Bayesian Hyperspectral Image Segmentation with Discriminative Class Learning

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

2. Illustration of the Nikkei 225 option data

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Applications of R Software in Bayesian Data Analysis

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Presence research and EEG. Summary

Chapter 14 Managing Operational Risks with Bayesian Networks

Tracking in flussi video 3D. Ing. Samuele Salti

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Component Ordering in Independent Component Analysis Based on Data Power

Analysis of Bayesian Dynamic Linear Models

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

SAS Software to Fit the Generalized Linear Model

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks

Hardware Implementation of Probabilistic State Machine for Word Recognition

MODEL MIXING FOR LONG-TERM EXTRAPOLATION

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

Big Data Visualisations. Professor Ian Nabney NCRG

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Conditional guidance as a response to supply uncertainty

Advanced Ensemble Strategies for Polynomial Models

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Standard errors of marginal effects in the heteroskedastic probit model

Tutorial on Markov Chain Monte Carlo

System Identification for Acoustic Comms.:

Moving averages. Rob J Hyndman. November 8, 2009

Machine Learning in FX Carry Basket Prediction

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Java Modules for Time Series Analysis

Machine Learning in Statistical Arbitrage

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Introduction to General and Generalized Linear Models

Part 2: One-parameter models

Fitting Subject-specific Curves to Grouped Longitudinal Data

Lecture 3: Linear methods for classification

Probabilistic user behavior models in online stores for recommender systems

Fractionally integrated data and the autodistributed lag model: results from a simulation study

EE 570: Location and Navigation

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

Estimating an ARMA Process

Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation

Testing for Granger causality between stock prices and economic growth

Course 8. An Introduction to the Kalman Filter

New Ensemble Combination Scheme

Towards Online Recognition of Handwritten Mathematics

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

Principled Hybrids of Generative and Discriminative Models

Large Time-Varying Parameter VARs

Transcription:

Adaptive Classification by Hybrid EKF with Truncated Filtering: Brain Computer Interfacing Ji Won Yoon, Stephen J. Roberts, Matthew Dyson 2, and John Q. Gan 2 Pattern Analysis and Machine Learning Group Department of Engineering Science, University of Oxford, UK jwyoon,sjrob}@robots.ox.ac.uk http://www.robots.ox.ac.uk/ parg 2 Department of Computing and Electronic Systems University of Essex, UK mdyson,jqgan}@essex.ac.uk http://dces.essex.ac.uk/staff/jqgan/ Abstract. This paper proposes a robust algorithm for adaptive modelling of EEG signal classification using a modified Extended Kalman Filter (EKF). This modified EKF combines Radial Basis functions (RBF) and Autoregressive (AR) modeling and obtains better classification performance by truncating the filtering distribution when new observations are very informative. Keywords: Extended Kalman Filter, Informative Observation, Logistic Classification, Truncated Filtering. Introduction The goal of Brain Computer Interfacing (BCI) is to enable people with severe neurological disabilities to operate computers by manipulation of the brain s electrical activity rather than by physical means. It is known that the generation and control of electrical brain activity (the electroencephalogram or EEG) signals for a BCI system often requires extensive subject training before a reliable communication channel can be formed between a human subject and a computer interface [,2. In order to reduce the overheads of training and, importantly, to cope with new subjects, adaptive approaches to the core data modelling have been developed for BCI systems. Such adaptive approaches differ from the typical methodology, in which an algorithm is trained off-line on retrospective data, in so much that the process of learning is continuously taking place rather than being confined to a section of training data. In the signal processing and machine learning communities this is referred to as sequential classification. Previous research in this area applied to BCI data has used state space modelling of the time series [3,4,5. In particular Extended Kalman Filter (EKF) approaches have been effective [3,5. In these previous models, based upon the EKF, the variances of the observation noise and hidden state noise are re-estimated using C. Fyfe et al. (Eds.): IDEAL 28, LNCS 5326, pp. 37 377, 28. c Springer-Verlag Berlin Heidelberg 28

Adaptive Classification by Hybrid EKF with Truncated Filtering 37 a maximum-likelihood style framework, ultimately due to the work of Jazwinski [7. In previous work [6, we developed a simple, computationally practial, modification of the conventional EKF approach in which we marginalise (integrate out) the unknown variance parameters for variances by applying Bayesian conjugate priors [8. This enabled us to avoid the need for spurious parameter estimation steps which decrease the robustness of the algorithm in highly non-stationary and noisy regions of data. The EKF offers one framework for approximating the non-linear linkage between observations and decisions (via the logistic function). In all this prior work, however, the output function assumed the target labels (the decision classes) to be time independent, i.e. there is no explict Markov dependency from one decision to the next. This is a poor model for the characteristics of the BCI interface, at the sample rate we operate at (over.2s intervals), in which sequences of labels of the same class persist for several seconds before making a transition. The algorithm presented here employs both a set of non-linear basis functions (thus making it a dynamic Radial Basis Function (RBF) classifier) and an autoregressive (AR) model to tackle this Markov dependency of the labeling. In addition, our approach modifies a filtering step by truncating the filtering distribution in terms of new observations. This modified filtering step provides a robust algorithm when given labels (i.e. as training data) are very informative. Moreover, this algorithm does not have to approximate the convolution of a logistic function and Normal distribution using a probit function, which can lead to poor performance for the EKF [5,6. 2 Mathematical Model We consider an observed input stream (i.e. a time series) x t at time t =, 2,,T which we project into a non-linear basis space, x t ϕ(x t ). We also (partially) observe z t, } such that Pr(z t = x t, w t )=g(f(x t, w t )) where g( ) can be logistic model or probit link function which takes the latent variables to the outcome decision variable. In this paper, we use the logistic function, i.e. g(s) =/( + e s ). The state space model for the BCI system can hence be regarded as a hierarchical model in that the noise of observations influences the model indirectly through the logistic regression model g( ). Such indirect influence makes for a more complicated model, but we can circumvent much of this complexity by forming a three stage state space model and by introducing an auxiliary variable y t. The latter variable acts so as to link the indirect relationships between observations and the logistic regression model given by p(z t y t )=g(y t ) zt ( g(y t )) zt y t = abw T t ϕ t(x t )+( a)y t + v t w t = Aw t + Lh t () where v t N(,κ)andh t N(,τI). The variable a denotes a convex mixing ratio between the Radial Basis Function (RBF) and the Autoregressive (AR) model. To simplify notation, we use ϕ t instead of ϕ t (x t ) i.e.

372 J.W. Yoon et al. ϕ t = ϕ t (x t )= [ x T t, φ() t (x t )} T,, φ (N b) t (x t )} T, T (2) and φ (i) t (x t ) is the response of the ith Gaussian basis function for i,,n b } at time t. Here, N b is the number of Gaussian basis functions and the basis functions are chosen in random. In adaptive classification, there are two steps in our state space model which perform model inference: Prediction: p(w t,y t z :t, x :t ) Filtering: p(w t,y t z :t, x :t ) after which we obtain ȳ t t = y t y t [ w t p(w t,y t z :t, x :t )dw t dy t. 2. Prediction p(w t,y t z :t, x :t ) = p(w t,y t, w t,y t z :t, x :t )dw t dy t = p(y t w t,y t )p(w t w t )p(w t,y t z :t, x :t )dw t dy t [ [ = p(y t w t,y t,κ)p(κ)dκ p(w t w t,τ)p(τ)dτ κ τ p(w t,y t z :t, x :t )dw t dy t (3) ( ( ) ) ( α N y t ; aϕ T ( ) ) t Bw t +( a)y t, κ α N w t ; Aw t, τ LL T N = N = N ([ yt w t ([ yt ([ yt where w t w t ; β κ ) ; μ t t,σ t t dw t dy t [ [ ) a aϕ T t BA yt,σ N A w t ; Sμ t t,σ+ SΣ t t S T ) [ a aϕ T t BA A [ Σy Σ,Σ = c, Σ c Σ w ([ yt w t ; μ t t,σ t t ) dw t dy t S = ( ) α Σ y = E[(y t E(y t ))(y t E(y t )) T =a 2 ϕ T ( ) t BL τ α L T B T ϕ t + κ, ( α Σ w = E[(w t E(w t ))(w t E(w t )) T =L τ ) L T, ( ) α Σ c = E[(y t E(y t ))(w t E(w t )) T =aϕ T t BL τ L T. (3) β κ (4) 2.2 Filtering In the filtering step, our EKF uses a new observation z t. In general, p(w t,y t z :t, x :t ) is used with the following form:

Adaptive Classification by Hybrid EKF with Truncated Filtering 373 p(w t,y t z :t, x :t ) p(z t, w t,y t z :t, x :t ) = p(z t y t )p(w t,y t z :t, x :t ). (4) It is difficult to obtain a full distribution for the EKF filtering step so previous approaches approximated the convolution between the normal distribution and a logistic function using the probit function [5,6. However, if the new observation is very informative, filtering can be improved without the need for the approximation, which is one of the major sources of non-linearity in logistic EKF filtering. We can obtain better filtering distributions by truncation as we do not need the convolution approximation. Hence we obtain p(w t,y t z :t, x :t ) p(wt,y = p(w t,y t z :t, x :t,z t )= t z :t, x :t )I yt (y t ) if z t = p(w t,y t z :t, x :t )I yt<(y t ) if z t =, if yt satisfies C where I C (y t )=, otherwise, p(wt y = t, z :t, x :t )p(y t z :t, x :t )I yt (y t ) if z t = p(w t y t, z :t, x :t )p(y t z :t, x :t )I yt<(y t ) if z t = p(wt y t, z :t, x :t )[p(y t z :t, x :t )I yt (y t ) if z t = p(w t y t, z :t, x :t )[p(y t z :t, x :t )I yt<(y t ) if z t = N (wt ;ˆμ = w,t t, ˆΣ w,t t ) [ N (y t ; μ y,t t,σ y,t t )I yt (y t ) if z t = N (w t ;ˆμ w,t t, ˆΣ w,t t ) [ N (y t ; μ y,t t,σ y,t t )I yt<(y t ) if z t = ˆμw,t t = μ w,t t + Σ c,t t Σ y,t t where (y t μ y,t t ) ˆΣ w,t t = Σ w,t t Σc,t t T Σ y,t t Σ c,t t N(w t ;ˆμ w,t t, ˆΣ w,t t )N (y t ; μ y,zt,σ y,zt )forz t, } where μ y,zt= = E N (yt;μ y,t t,σ y,t t )I yt (y t)(y t ) μ y,zt= = E N (yt;μ y,t t,σ y,t t )I yt <(y t)(y t ) Σ y,zt= = V N (yt;μ y,t t,σ y,t t )I yt (y t)(y t ) Σ y,zt= = V N (yt;μ y,t t,σ y,t t )I yt <(y t)(y t ) (5) and E( ) and V( ) denote mean and covariance of the truncated normal distribution respectively. The calculation of the mean and covariance of the truncated normal distribution is described in the appendix. Therefore, we have p(w t,y t z :t, x :t) =N yt w t yt μy,t t ; μ t t,σ t t = N ;, w t μ w,t t Σy,t t Σ c,t t Σ c,t t Σ w,t t (6)

374 J.W. Yoon et al. where μ y,t t = μ yt,zt Σ y,t t = Σ yt,zt Σ c,t t = Σ c,t t Σ y,t t Σyt,zt μ w,t t = μ w,t t + Σ c,t t Σ y,t t (μy,zt μ y,t t ) Σ w,t t = Σ w,t t + Σ c,t t Σ y,t t Σy,zt Σ y,t t Σ c,t t Σ c,t t Σ y,t t ΣT c,t t (7) 3 Results for Experimental Data Set 3. Data Acquisition Data used in this experiment consisted of two channels of EEG, recorded at 256Hz placed over the central portion of the head and one channel of muscle electrical activity (EMG), recorded at 24Hz over the muscles of the right fore-arm. The EMG was then down-sampled to 256Hz and muscle contraction strength for movement and non-movement detection was evaluated via a simple windowed peak and trough detection; this then formed a movement / non-movement label. 3.2 Feature Extraction and Basis Formation The second reflection coefficient of a second-order autoregressive (AR) model [ were calculated over each EEG signal once every 78ms using a sliding one-second- a) Underlying true labels Missing labels 5 5 2 25 3 35 4 45 b).5 5 5 2 25 3 35 4 45 c).5 5 5 2 25 3 35 4 45 d).5 5 5 2 25 3 35 4 45 Fig.. Comparison of different EKF algorithms: a) trajectories with true labels (blue) and missing labels (red), b) Standard EKF (fixed τ and κ are used), c) Modified EKF (marginalising both τ and κ), d) hybrid EKF using marginalistion and truncation

Adaptive Classification by Hybrid EKF with Truncated Filtering 375 Cross Entropy 2 8 Values 6 4 2 Standard EKF Marginalised EKF hybrid EKF:(a=.5) Fig. 2. Comparison of Cross Entropy: the method proposed in this paper outperforms other EKF approaches long window, forming a set of feature vectors x t. These vectors were projected into a non-linear latent space using a set of Gaussian basis functions (N b = ), providing the stream of ϕ t. 3.3 Performance Comparison In this paper, hyper-parameters defining prior distributions over τ and κ are given by α τ =2, =, α κ =5,andβ κ =.. Since there is no proper model in dynamics for our model, we used a transition kernel based on random walks so that the matrices of Eq. () are known as A = B = L = I Nw where I Nw denotes an N w N w size identity matrix and N w is the number of elements of w t.for comparison, fixed τ and κ are chosen by τ = E(τ α τ, )= τp(τ α τ, )dτ and κ = E(κ α κ,β κ )= κp(κ α κ,β κ )dκ where τ. Also,p(w )=N ( ; μ,σ ) where μ = and Σ = τ I respectively. Here τ = 2. Fig. shows the comparison of several methods. The first plot of the figure shows the labels: inactive (z t = ), active (z t = ) and unknown labels (z t = ) in the movement respectively. The algorithm receives no label information after t = samples. Fig. 2 demonstrates the comparison of the cross entropy of the several methods. The cross entropy of the tth sample is defined by zt = log p(z t z t )= log z t ( z t ) zt} (8) E cross t where z t and z t represent for underlying labels and the posterior of estimated labels respectively. 4 Conclusion and Future Work In this paper, we propose a dynamic Bayesian model for adaptive classification in time series based on a Bayesian version of the modified Extended Kalman filter

376 J.W. Yoon et al. which combines Radial basis function and autoregressive model. The AR model explains the Markov dependency of the output function so that it provides the possibility of improvement in conventional EKF classification. This proposed algorithm also employs truncation of filtering distribution to obtain a better classifier when the observation is very informative. The critical data-specific sensitivity of our model is to the hyper-parameter a. In principle this can be inferred from a section of labelled data or set via knowledge of the typical withinstate times. This requires a more extensive testing against a larger number of data sets than presented in this pilot study and this will be the focus of future work. Acknowledgment This project is supported by UK EPSRC funding to whom we are most grateful. References. Wopaw, J.R., McFarland, D.J., Neat, D.J., Forneris, C.A.: An EEG-based braincomputer interface for cursor control. Electroencephalogr. Clin. Neurophysiol. 78, 252 259 (99) 2. Pfurtscheller, G., Flotzinger, D., Kalcher, J.: Brain-computer interface: a new communication device for handicapped persons. Journal of Microcomputer Applications archive 6(3), 293 299 (993) 3. Sykacek, P., Roberts, S.J., Stokes, M.: Adaptive BCI Based on Variational Bayesian Kalman Filtering: An Empirical Evaluation. IEEE Transactions on Biomedical Engineering 5(5), 79 727 (24) 4. Penny, W.D., Roberts, S.J., Curran, E.A., Stokes, M.: EEG-based communication: a pattern recognition approach. IEEE Transactions on Rehabilitation Engineering 8(2), 24 26 (2) 5. Lowne, D.R., Roberts, S.J., Garnett, R.: Sequential Non-stationary Dynamic Classification. Machine Learning (submitted, 28) 6. Yoon, J., Roberts, S.J., Dyson, M., Gan, J.Q.: Sequential Bayesian Estimation for Adaptive Classification. Multisensor Fusion and Integration for Intelligent Systems (accepted for publication, 28) 7. Jazwinski, A.H., Bailie, A.E.: Adaptive filtering Interim report. NASA Technical Reports (March, 967) 8. Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, Chichester (994) 9. Bishop, C.M.: Pattern Recognition and Machine Learning, st edn. Springer, Heidelberg (988). Pardey, J., Roberts, S.J., Tarassenko, L.: A Review of Parametric Modelling Techniques for EEG Analysis. Med. Eng. Phys. 8(), 2 (996) Appendix 4. Laplace Approximation In Eq. (3), the non-gaussian distribution generated from the marginalisation and logistic model are approximated by Gaussian distributions using the Laplace

Adaptive Classification by Hybrid EKF with Truncated Filtering 377 approximation. Let π(w t ) be the non-gaussian distribution of interest. Then, we have π(w t )=(2π) Nw/2 βατ τ Γ (α τ [ ) Γ (α τ ) βτ + 2 (w t w t ) T (w t w t ) α. (9) τ In order to find the mode of π(w t ), we use log of the distribution given by [ =logπ(w t )= α q log + 2 (w t w t ) T (w t w t ) + c () L wt where c = N w /2log(2π) +α τ log log Γ (α τ ) + log(α τ ). Using the first derivate of L wt in terms of w t,wehave dl α τ = dw t + 2 (w t w t ) T (w t w t ) (w t w t ) = () and a mode can be easily obtained by w t = w t. In addition, using the second derivative of the L wt, we can obtain the covariance of the approximated distribution as follows: d 2 L dw 2 t wt=w t = α τ. (2) Now, we have an approximated distribution ( ( ) ) α π(w t ) N w t ; w t, τ. (3) 4.2 Truncated Normal Distribution Suppose we condition on Y t θ =[θ,θ 2, where <θ <θ 2 <. When we have the normal distribution with mean μ = μ y,t t and standard deviation σ = Σ y,t t, the conditional density of Y t is σ p(y t θ) = φ ( y t μ) ) σ ),θ y t θ 2 (4) Φ Φ ( θ2 μ σ ( θ μ σ where φ( ) andφ( ) denote normal distribution function and normal cumulative distribution function respectively. μ y,zt= = E[Y t Y t >==μ + σ φ( μ σ ) Φ( μ σ ) μ y,zt= = E[Y t Y t < = μ σ φ( μ σ ) Φ( μ σ ) Σ y,zt= = V[Y t Y t >==σ 2 + μ σ φ( μ σ ) [ φ( μ Φ( μ σ ) σ ) 2 } Φ( μ σ ) Σ y,zt= = V[Y t Y t < = σ 2 + μ σ φ( μ σ ) Φ( μ σ ) [ φ( μ σ ) Φ( μ σ ) 2 }. (5)