GoodEnough Brain Model: Challenges, Algorithms and Discoveries in MultiSubject Experiments


 Garey Rodgers
 1 years ago
 Views:
Transcription
1 GeBM GeBM GeBM GoodEnough Brain Model: Challenges, Algorithms and Discoveries in MultiSubject Experiments Evangelos E. Papalexakis Carnegie Mellon University Alona Fyshe Carnegie Mellon University Nicholas D. Sidiropoulos University of Minnesota y(t) =C [m n] x(t) Partha PratimFigure Talukdar : Comparison of true Tom brain M. activity Mitchell and brain activity genchristos University using LS, and Carnegie CCA solutions Mellon University to MODEL. Clearly, Carnegiequations, Mellon University which constitute our proposed m Putting Faloutsos everything toger, we end up w Carnegie Mellonerated MODEL is not able capture trends of brain activity, x(t +) = A [n n] x(t) + and to end of minimizing squared error, produces an almost straight line that dissects brain activity waveform. y(t) = C [m n] x(t) apple! ABSTRACT Is it edible? (y/n)! Additionally, we require hidden func knife! Can it hurt you? (y/n)! Given a simple noun such as apple, and a question such as is it edible?, what processes take place in human brain? More specifi FL! PL! (attention)! brain interact directly with each or. Frontal lobe! trix A to beparietal sparselobe! because, intuitively, not (movement)! cally, given stimulus, what. are Proposed interactions between approach: (groups GeBM TL! OL! formulation MEG! of GEBM, we seek to obtain a m Real and predicted MEG brain activity of) neurons (also known as functional Formulating connectivity) problem and how as MODEL can is not. able to meet. re interactions, for given ourmeasurements desired solution. of However, while obeying dynamics dictated by mo we automatically infer thosequirements voxel. we! have. " not exhausted how does space this connectivity of possible differ formulations that. live within Real. and our predicted set MEG brain activity matrices, since an exact zero on providing 6more insightful and easy to inter brain activity? Furrmore, Real and predicted MEG.5 brain activity..... across different human subjects? of simplifying assumptions. In this section, we describe GEBM, plicitly Occipital lobe! states that re is no direct interacti.5.5 voxel! (vision)!.. In this work we present a simple, our proposed novel goodenough approach which, brain model, under assumptions that. we have contrary, a very small value in mat Temporal. lobe! 9.. or GEBM in short, and a novel already algorithm made SPARSESYSID, in Section, is able which to meet our requirements remarkably well. dynamics of neuron interac Real and predicted MEG brain activity negligible. and thus could be ignored, or tha sparse) is ambiguous and could imply ei GeBM... (language)! Real and Real predicted and predicted MEG brain Real MEG activity brain predicted activitymeg brain activity are able to effectively model tions and infer functional In connectivity. order to come Moreover, up with GEBM a more is.accurate. model,.... voxel 6! it is useful to with very small weight between two neu. " able to simulate basic psychological look more phenomena carefullysuch at as actual habituation and priming (whose definition we provide in main text). system that we are attempting to The key ideas behind GEBM are: We evaluate GEBM by using both syntic and brain data. Using data, GEBM produces brain activity patterns that are strikingly similar to ones, and inferred functional connectivity is able to provide neuroscientific insights towards a better connectivity (top right, weighted arrows indicating number of in Figure : Big picture: our GEBM estimates hidden functional understanding of way that neurons interact with each or, as ferred connections), when given multiple human subjects (left) that well as detect regularities and outliers in multisubject brain activity measurements. apple). Bottom right: GEBM also produces brain activity (in solid respond to yes/no questions (e.g., edible?) for typed words (e.g., red), that matches ity (in dashedblue). Categories and Subject Descriptors H..8 [Database Management]: Database Applications Data mining; G. [Mamatics of Computing]: Probability and Statistics Time series analysis; J. [Computer Applications]: Life and Medical Sciences Biology and genetics; J. [Computer Applications]: Social and Behavioral Sciences Psychology Keywords CCA) brain activity for a particular voxel (results by LS and CCA are similar to one in Fig. for all voxels). By minimizing sum of squared errors, both algorithms that solve for MODEL resort to a simple line that increases very slowly over time, thus having a minimal squared error, given linearity assumptions. Real and predicted MEG brain activity Brain Activity Analysis; System Identification; Brain Functional Connectivity; Control Theory; Neuroscience Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on first page. Copyrights for components of this work owned by ors than ACM must be honored. Abstracting with credit is permitted. To copy orwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from KDD, August 7,, New York, NY, USA. Copyright ACM //8...$5.. " LS CCA " " son s brain, which, in turn, cause effect with our MEG sensors. Let us assume that re are n hidden (hyp which interact with each or, causing a in y. We denote vector of hidden size n. Then, by using same idea a formulate temporal evolution of hidd x(t +)=A [n n] x(t) +B Having introduced above equation, w modelling underlying, hidden process w serve. However, an issue that we have yet to x is not observed and we have no means of pose to resolve this issue by modelling m itself, i.e. model transformation of a hid tor to its observed counterpart. We assume is linear, thus we are able to write. INTRODUCTION Can we infer brain activity for subject Alice, when she is shown typed noun apple and has to answer a yes/no question, like is it edible? Can we infer connectivity of brain regions, given numerous brain activity data of subjects in such experiments? These are first two goals of this work: singlesubject, and multisubject analysis of brain activity. The third and final goal is to develop a brain connectivity model, that can also generate activity that agrees with psychological phenomena, like priming and habituation Here we tackle all se challenges. We are given Magnetoencephalography (MEG) brain scans for nine subjects, shown several typed nouns (apple, hammer, etc), and being requested to answer a yes/no question (is it edible?, is it dangerous?, and so on), by pressing one of two buttons. Our approach: Discovering multibillion connections among Priming illustrates power of context: a person hearing word ipod, and n apple, will think of Appleinc, as opposed to fruit apple Habituation illustrates compensation: a person hearing same word all time, will eventually stop paying attention.
2 tens of billions [, ] of neurons would be holy grail, and clearly outside current technological capabilities. How close can we approach this ideal? We propose to use a goodenough approach, and try to explain as much as we can, by assuming a small, manageable count of neuronregions and ir interconnections, and trying to guess connectivity from available MEG data. In more detail, we propose to formulate problem as system identification from control ory, and we develop novel algorithms to find sparse solutions. We show that our goodenough approach is a very good first step, leading to a tractable, yet effective model (GEBM), that can answer above questions. Figure gives highlevel overview: at bottomright, blue, dashedline time sequences correspond to measured brain activity; red lines correspond to guess of our GEBM model. Notice qualitative goodness of fit. At topright, arrows indicate interaction between brain regions that our analysis learned, with weight being strength of interaction. Thus we see that vision cortex ( occipital lobe ) is well connected to languageprocessing part ( temporal lobe ), which agrees with neuroscience, since all our experiments involved typed words. Our contributions are as follows: Novel analytical model: We propose GEBM model (see Section, and Eq ()()). Algorithm: we introduce SPARSESYSID, a novel, sparse, systemidentification algorithm (see Section ). Effectiveness: Our model can explain psychological phenomena, such as habituation and priming (see Section 5.); it also gives results that agree with experts intuition (see Section 5.) Validation: GEBM indeed matches given activity patterns, both on syntic, as well as data (see Section and 5., resp.). Multisubject analysis: Our SPARSESYSID, applied on 9 human subjects (Section 5.), showed that (a) 8 of m had very consistent brainconnectivity patterns while (b) outlier was due to exogenous factors (excessive roadtraffic noise during his experiment). Additionally, our GEBM highlights connections between multiple, mostly disparate areas: ) Neuroscience, ) Control Theory & System Identification, and ) Psychology. Reproducibility: Our implementation is open sourced and publicly available. Due to privacy reasons, we are not able to release MEG data, however, in online version of code we include syntic benchmarks, as well as simulation of psychological phenomena using GEBM.. PROBLEM DEFINITION As mentioned earlier, our goal is to infer brain connectivity, given measurements of brain activity on multiple yes/no tasks, of multiple subjects. We define as yes/no task experiment where subject is given a yes/no question (like, is it edible?, is it alive? ), and a typed English word (like, apple, chair), and has to decide answer. Throughout entire process, we attach m sensors that record brain activity of a human subject. Here we are using Magnetoencephalography (MEG) data, although our GEBM model could be applied to any type of measurement (fmri, etc). In Section 5. we provide a more formal definition of measurement technique. Thus, in a given experiment, at every timetick t we have m zip measurements, which we arrange in an m vector y(t). Additionally, we represent stimulus (e.g. apple) and task (e.g. is it edible?) in a timedependent vector s(t), by using feature representation of stimuli; a detailed description of how stimulus vector is formed can be found in Section 5.. For rest of paper, we shall use interchangeably terms sensor, voxel and neuronregion. We are interested in two problems: first is to understand how brain works, given a single subject. The second problem is to do crosssubject analysis, to find commonalities (and deviations) in a group of several human subjects. Informally, we have: INFORMAL PROBLEM (SINGLE SUBJECT). Definition:  Given: The input stimulus; and a sequence of m T brain activity measurements for m voxels, for all timeticks t = T  Estimate: functional connectivity of brain, i.e. strength and direction of interaction, between pairs of m voxels, such that. we understand how brainregions collaborate, and. we can effectively simulate brain activity. For second problem, informally we have: INFORMAL PROBLEM (MULTISUBJECT ). Definition:  Given: Multisubject experimental data (brain activity for yes/no tasks )  Detect: Regularities, commonalities, clusters of subjects (if any), outlier subjects (if any). For particular experimental setting, prior work [5] has only considered transformations from space of noun features to voxel space and vice versa, as well as wordconcept specific prediction based on estimating covariance between voxels [7]. Next we formalize problems, we show some straightforward (but unsuccessful) solutions, and finally we give proposed model GEBM, and estimation algorithm.. PROBLEM FORMULATION AND PRO POSED METHOD There are two overarching assumptions: linearity: linear models are goodenough stationarity: connectivity of brain does not change, at least for timescales of our experiments. Nonlinear/sigmoid models is a natural direction for future work; and so is study of neuroplasticity, where connectivity changes. However, as we show later, linear, static, models are goodenough to answer problems we listed, and thus we stay with m. However, we have to be careful. Next we list some natural, but unsuccessful models, to illustrate that we did do due dilligence, and to highlight need for our slightly more complicated, GEBM model. The conclusion is that hasty, Model, below, leads to poor behavior, as we show in Figure (red, and black, lines), completely missing all trends and oscillations of signal (in dottedblue line). In fact, next subsection may be skipped, at a first reading.. First (unsuccessful) approach: Model Given linearity and staticconnectivity assumptions above, a natural additional assumption is to postulate that m brain activity vector y(t + ) depends linearly, on activities of previous timetick y(t), and, of course, input stimulus, that is, s vector s(t).
3 Formally, in absence of input stimulus, we expect that y(t + ) = Ay(t). where A is m m connectivity matrix of m brain regions. Including (linear) influence of input stimulus s(t), we reach MODEL : y(t + ) = A [m m] y(t) + B [m s] s(t) () The B [m s] matrix shows how s input signals affect m brainregions. To solve for A, B, notice that: y(t + ) = [ A B ] [ ] y(t) s(t) which eventually becomes Y = [ A B ] [ ] Y S In above equation, we arranged all measurement vectors y(t) in matrices: Y = [ y() y(t ) ], Y = [ y() y(t ) ], and S = [ s() s(t ) ] This is a wellknown, least squares problem. We can solve it as is ; we can ask for a lowrank solution; or for a sparse solution  none yields a good result, but we briefly describe each, next. Least Squares (LS): The solution is unique, using Moore Penrose pseudoinverse, i.e. [ A B ] [ ] Y = LS Y. S Canonical Correlation Analysis (CCA): The reader may be wondering: what if we have overfitting here  why not ask for a lowrank solution. This is exactly what CCA does []. It solves for same objective function as in LS, furr requesting low rank r for [ A B ] Sparse solution: what if we solve least squares problem, furr requesting a sparse solution? We tried that, too, with l norm regularization. None of above worked. Fig. shows brain activity (dottedblue line) and predicted activity, using LS (pink) and CCA (black), for a particular voxel. The solutions completely fail to match trends and oscillations. The results for l regularization, and for several or voxels, are similar to one shown, and omitted for brevity..... Real and predicted MEG brain activity LS CCA Figure : MODEL fails: True brain activity (dotted blue) and model estimate (pink, and black, resp., for least squares, and for CCA variation). The conclusion of this subsection is that we need a more complicated model, which leads us to GEBM, next.. Proposed approach: GeBM Before we introduce our proposed model, we should introduce our notation, which is succinctly shown in Table. Formulating problem as MODEL is not able to meet requirements for our desired solution. However, we have not exhausted space of possible formulations that live within our set Symbol Definition n number of hidden neuronregions m number of MEG sensors/voxels we observe (6) s number of input signals ( questions) T timeticks of each experiment ( ticks, of 5msec each) x(t) vector of neuron activities at time t y(t) vector of voxel activities at time t s(t) vector of inputsensor activities at time t A [n n] connectivity matrix between neurons (or neuron regions) C [m n] summarization matrix (neurons to voxels) B [n s] perception matrix (sensors to neurons) A v connectivity matrix between voxels REAL part of a complex number IMAG imaginary part of a complex number A MoorePenrose Pseudoinverse of A Table : Table of symbols of simplifying assumptions. In this section, we describe GEBM, our proposed approach which, under assumptions that we have already made in Section, is able to meet our requirements remarkably well. In order to come up with a more accurate model, it is useful to look more carefully at actual system that we are attempting to model. In particular, brain activity vector y that we observe is simply collection of values recorded by m sensors, placed on a person s scalp. In MODEL, we attempt to model dynamics of sensor measurements directly. However, by doing so, we are directing our attention to an observable proxy of process that we are trying to estimate (i.e. functional connectivity). Instead, it is more beneficial to model direct outcome of that process. Ideally, we would like to capture dynamics of internal state of person s brain, which, in turn, cause effect that we are measuring with our MEG sensors. Let us assume that re are n hidden (hyper)regions of brain, which interact with each or, causing activity that we observe in y. We denote vector of hidden brain activity as x of size n. Then, by using same idea as in MODEL, we may formulate temporal evolution of hidden brain activity as: x(t + ) = A [n n] x(t) + B [n s] s(t) A subtle issue that we have yet to address is fact that x is not observed and we have no means of measuring it. We propose to resolve this issue by modelling measurement procedure itself, i.e. model transformation of a hidden brain activity vector to its observed counterpart. We assume that this transformation is linear, thus we are able to write y(t) = C [m n] x(t) Putting everything toger, we end up with following set of equations, which constitute our proposed model GEBM: x(t + ) = A [n n] x(t) + B [n s] s(t) y(t) = C [m n] x(t) The key concepts behind GEBM are: (Latent) Connectivity Matrix: We assume that re are n regions, each containing or more neurons, and y are connected with an n n adjacency matrix A [n n]. We only observe m voxels, each containing multiple regions, and we () ()
4 record activity (eg., magnetic activity) in each of m; this is total activity in constituent regions Measurement Matrix: Matrix C [m n] is an m n matrix, with c i,j = if voxel i contains region j Perception Matrix: Matrix B [n s] shows influence of each sensor to each neuronregion. The input is denoted as s, with s input signals Sparsity: We require that our model s matrices are sparse; only few sensors are responsible for a specific brain region. Additionally, interactions between regions should not form a complete graph, and finally, perception matrix should map only few activated sensors to neuron regions at every given time.. Algorithm Our solution is inspired by control ory, and more specifically by a subfield of control ory, called system identification. In appendix, we provide an overview of how this can be accomplished. However, matrices we obtain through this process are usually dense, counter to GEBM s specifications. We, thus, need to refine solution until we obtain desired level of sparsity. In next few lines, we show why this sparsification has to be done carefully, and we present our approach. Crucial to GEBM s behavior is spectrum of its matrices; in or words, any operation that we apply on any of GEBM s matrices needs to preserve eigevnalue profile (for matrix A) or singular values (for matrices B, C). Alterations reof may lead GEBM to instabilities. From a control oretic and stability perspective, we are mostly interested in eigenvalues of A, since y drive behavior of system. Thus, in our experiments, we heavily rely on assessing how well we estimate se eigenvalues. Sparsifying a matrix while preserving its spectrum can be seen as a similarity transformation of matrix to a sparse subspace. The following lemma sheds more light towards this direction. LEMMA. System identification is able to recover matrices A, B, C of GEBM up to rotational/similarity transformations. PROOF. See appendix. An important corollary of above lemma (also proved in appendix) is fact that pursuing sparsity only on, say, matrix A is not well defined. Therefore, since all three matrices share same similarity transformation freedom, we have to sparsify all three. In SPARSESYSID, we propose a fast, greedy sparsification scheme which can be seen as approximately applying aforementioned similarity transformation to A, B, C, without calculating or applying transformation itself. Iteratively, for all three matrices, we delete small values, while maintaining r spectrum within ɛ from one obtained through system identification. Additionally, for A, we also do not allow eigenvalues to switch from complex to and vice versa. This scheme works very well in practice, providing very sparse matrices, while respecting ir spectrum. In Algorithm, we provide an outline of algorithm. So far, GEBM as we have described it, is able to give us hidden functional connectivity and measurement matrix, but does not directly offer voxeltovoxel connectivity, unlike MODEL, which models it explicitly. However, this is by no means a weakness of GEBM, since re is a simple way to obtain voxeltovoxel connectivity (henceforth referred to as A v) from GEBM s matrices. LEMMA. Assuming that m > n, voxeltovoxel functional connectivity matrix A v can be defined and is equal to A v = CAC Algorithm : SPARSESYSID: Sparse System Identification of GEBM Input: Training data in form {y(t), s(t)} T t=, number of hidden states n. Output: GEBM matrices A (hidden connectivity matrix), B (perception matrix), C (measurement matrix), and A v (voxeltovoxel matrix). ( : {A (), B (), C ( )} = SYSID {y(t), s(t)} T t=, n ) : A = EIGENSPARSIFY(A () ) : B = SINGULARSPARSIFY(B () ) : C = SINGULARSPARSIFY(C () ) 5: A v = CAC Algorithm : EIGENSPARSIFY: Eigenvalue Preserving Sparsification of System Matrix A. Input: Square matrix A (). Output: Sparsified matrix A. : λ () =EIGENVALUES(A () ) : Initialize d () R =, d() I =. Vector d (i) R holds elementwise difference of part of eigenvalues of and imaginary part. : Set vector c as a boolean vector that indicates wher jth eigenvalue in λ () is complex or not. One way to do it is to evaluate elementwise following boolean expression: A (i). Similarly for d (i) I ( ) c = IMAG(λ () ). : Initialize i = 5: while d (i) R ɛ and d(i) I do 6: Initialize A (i) = A (i ) 7: {vi, vj } = arg min vi,v j A (i ) (v i, v j) s.t. A (i ) (v i, v j). 8: Set A (i) (vi, vj ) = 9: λ (i) =EIGENVALUES(A (i) ) : d (i) R = REAL(λ(i) ) REAL(λ (i ) ) : d (i) I = IMAG(λ (i) ) IMAG(λ (i ) ) : end while : A = A (i ) ( ) ɛ and IMAG(λ (i) ) == c Algorithm : SINGULARSPARSIFY: Singular Value Preserving Sparsification Input: Matrix M (). Output: Sparsified matrix M : λ () =SINGULARVALUES(A () ) : Initialize d () R = which holds elementwise difference of singular values of A (i). : Initialize i = : while d (i) R ɛ do 5: Initialize M (i) = M (i ) 6: {vi, vj } = arg min vi,v j M (i ) (v i, v j) s.t. M (i ) (v i, v j). 7: Set M (i) (vi, vj ) = 8: λ (i) =SINGULARVALUES(M (i) ) 9: d (i) R = λ(i) λ (i ) : end while : M = M (i )
5 PROOF. The observed voxel vector can be written as y(t + ) = Cx(t + ) = CAx(t) + CBs(t) Matrix C is tall (i.e. m > n), thus we can write: y(t) = Cx(t) x(t) = C y(t) Consequently, y(t + ) = CAC y(t) + CBs(t) Therefore, it follows that CAC is voxeltovoxel matrix A v. Finally, an interesting aspect of our proposed model GEBM is fact that if we ignore notion of summarization, i.e. matrix C = I, n our model is reduced to simple model MODEL. In or words, GEBM contains MODEL as a special case. This observation demonstrates importance of hidden states in GEBM.. EVALUATION. Implementation Details The code for SPARSESYSID has been written in Matlab. For system identification part, initially we experimented with Matlab s System Identification Toolbox and algorithms in []. These algorithms worked well for smaller to medium scales, but were unable to perform on our full dataset. Thus, in our final implementation, we use algorithms of []. Our code is publicly available at Evaluation on syntic data In lieu of ground truth in our data, we generated syntic data to measure performance of SPARSESYSID. The way we generate ground truth system is as follows: First, given fixed n, we generate a matrix A that has.5 on main diagonal,. on first upper diagonal (i.e. (i, i + ) elements), .5 on first lower diagonal (i.e., (i, i) elements), and everywhere else. We n create randomly generated sparse matrices B and C, varying s and m respectively. After we generate a syntic ground truth model, we generate Gaussian random input data to system, and we obtain system s response to that data. Consequently, we use input/output pairs with SPARSESYSID, and we assess our algorithm s ability to recover ground truth. Here, we show noiseless case due to space restrictions. In noisy case, estimation performance is slowly degrading when n increases, however this is expected from estimation ory. We evaluate SPARSESYSID s accuracy with respect to following aspects: Q: How well can SPARSESYSID recover true hidden connectivity matrix A? Q: How well can SPARSESYSID recover voxeltovoxel connectivity matrix A v? Q: Given that we know true number of hidden states n, how does SPARSESYSID behave as we vary n used for GEBM? In order to answer Q, we measure how well (in terms of RMSE) SPARSESYSID recovers eigenvalues of A. We are mostly interested in recovering perfectly part of eigenvalues, since even small errors could lead to instabilities. Figure (a) shows our results: We observe that estimation of part of eigenvalues of A is excellent. We are omitting estimation results for imaginary parts, however y are within ɛ we selected in our sparsification scheme of SPARSESYSID. Overall, SPARSESYSID is able to recover true GEBM, for various values of m and n. With respect to Q, it suffices to measure RMSE of true A v and estimated one, since we have thoroughly tested system s behavior in Q. Figure (b) shows that estimation of voxeltovoxel connectivity matrix using SPARSESYSID is highly accurate. Additionally, for ease of exposition, in Figure we show an example a true matrix A v, and its estimation through SPARSE SYSID; it is impossible to tell difference between two matrices, a fact also corroborated by RMSE results. The third dimension of SPARSESYSID s performance is its sensitivity to selection of parameter n; In order to test this, we generated a ground truth GEBM with a known n, and we varied our selection of n for SPARSESYSID. The result of experiment is shown in Fig. (c). We observe that for values of n smaller than one, SPARSESYSID s performance is increasingly good, and still, for small values of n estimation quality is good. When n exceeds value of n, performance starts to degrade, due to overfitting. This provides an insight on how to choose n for SPARSESYSID in order to fit GEBM: it is better to underestimate n rar than overestimate it, thus, it is better to start with a small n and possibly increase it as soon as performance (e.g. qualitative assessment of how well estimated model predicts brain activity) starts to degrade. Figure : Q: Perfect estimation of A v:comparison of true and estimated A v, for n = and m =. We can see, qualitatively, that GEBM is able to recover true voxeltovoxel functional connectivity. 5. GeBM AT WORK This section is focused on showing different aspects of GEBM at work. In particular, we present following discoveries: D: We provide insights on obtained functional connectivity from a Neuroscientific point of view. D: Given multiple human subjects, we discover regularities and outliers, with respect to functional connectivity. D: We demonstrate GEBM s ability to simulate brain activity. D: We show how GEBM is able to capture two basic psychological phenomena. Dataset Description & Formulation. We are using brain activity data, measured using MEG. MEG (Magnetoencephalography) measures magnetic field caused by many thousands of neurons firing toger, and has good time resolution ( Hz) but poor spatial resolution. fmri (functional Magnetic Resonance Imaging) measures change in blood oxygenation that results from changes in neural activity, and has good spatial resolution but poor time resolution (.5 Hz). Since we are interested in temporal dynamics of brain, we choose to operate on MEG data. All experiments were conducted at University of Pittsburgh Medical Center (UPMC) Brain Mapping Center. The MEG machine consists of m = 6 sensors, placed uniformly across subject s scalp. The temporal granularity of measurements is 5ms, resulting in T = time points; after experimenting with different aggregations in temporal dimension, we decided to use
6 RMSE of 8 x part of eigenvalues RMSE vs n for Av RMSE vs n for Av when true n = 5 RMSE n m= m= (a) Q: RMSE of REAL(eigenvalues) of A vs. n. RMSE 5 5 m= m= 5 5 n (b) Q: Estimation RMSE for A v vs. n. RMSE Good approximation True n Overfitting 6 n (c) Q: Estimation RMSE for A v vs. n, where true n is fixed and equal to 5. Figure : Subfigure (a) refers to Q, and show sthat SPARSESYSID is able to estimate matrix A with high accuracy, in controloretic terms. Subfigure (b) illustrates SPARSESYSID s ability to accurately estimate voxeltovoxel functional connectivity matrix A v. Finally, subfigure (c) shows behavior of SPARSESYSID with respect to parameter n, when true value of this parameter is known; message from this graph is that as long as n is underestimated, SPARSESYSID s performance is steadily good and is not greatly influenced by particular choice of n. 5ms of time resolution, because this yielded most interpretable results. For experiments, nine right handed human subjects were shown a set of 6 concrete English nouns (apple, knife etc), and for each noun simple yes/no questions (Is it edible? Can you buy it? etc). The subject were asked to press right button if ir answer to each question was yes, or left button if answer was no. After subject pressed button, stimulus (i.e. noun) would disappear from screen. We also record exact time that subject pressed button, relative to appearance of stimulus on screen. A more detailed description of data can be found in [5]. In order to bring above data to format that our model expects, we make following design choices: In lack of sensors that measure response of eyes to shown stimuli, we represent each stimulus by a set of semantic features for that specific noun. This set of features is a superset of questions that we have already mentioned; value for each feature comes from answers given by Amazon Mechanical Turk workers. Thus, from timetick (when stimulus starts showing), until button is pressed, all features that are active for particular stimulus are set to on our stimulus vector s, and all rest features are equal to ; when button is pressed, all features are zeroed out. On top of stimulus features, we also have to incorporate task information in s, i.e. particular question shown on screen. In order to do that, we add more rows to stimulus vector s, each one corresponding to every question/task. At each given experiment, only one of those rows is set to for all time ticks, and all or rows are set to. Thus, number of input sensors in our formulation is s = (i.e. neurons for noun/stimulus and neurons for task). As a last step, we have to incorporate button pressing information to our model; to that end, we add two more voxels to our observed vector y, corresponding to left and right button pressing; initially, those values are set to and as soon as button is pressed, y are set to. Finally, we choose n = 5 for all results we show in following lines; particular choice of n did not incur qualitative changes in results, however, as we highlight in previous section, it is better to underestimate n, and refore we chose n = 5 as an adequately small choice which, at same time, produces interpretable results. 5. D: Functional Connectivity Graphs The primary focus of this work is to estimate functional connectivity of human brain, i.e. interaction pattern of groups of neurons. In next few lines, we present our findings in a concise way and provide Neuroscientific insights regarding interaction patterns that GEBM was able to infer. In order to present our findings, we postprocess results obtained through GEBM in following way: The data we collect come from 6 sensors, placed on human scalp in a uniform fashion. Each of those 6 sensors is measuring activity from one of four main regions of brain, i.e.  Frontal Lobe, associated with attention, short memory, and planning.  Parietal Lobe, associated with movement.  Occipital Lobe, associated with vision.  Temporal Lobe, associated with sensory input processing, language comprehension, and visual memory retention. Even though our sensors offer withinregion resolution, for exposition purposes, we chose to aggregate our findings per region; by doing so, we are still able to provide useful neuroscientific insights. Figure 5 shows functional connectivity graph obtained using GEBM. The weights indicate strength of interaction, measured by number of distinct connections we identified. These results are consistent with current research regarding nature of language processing in brain. For example, Hickock and Poeppel [9] have proposed a model of language comprehension that includes a dorsal and ventral pathway. The ventral pathway takes input stimuli (spoken language in case of Hickock and Poeppel, images and words in ours) and sends information to temporal lobe for semantic processing. Because occipital cortex is responsible for low level processing of visual stimuli (including words) it is reasonable to see a strong set of connections between occipital and temporal lobes. The dorsal pathway sends processed sensory input through parietal and frontal lobes where y are processed for planning and action purposes. The task performed during collection of our MEG data
7 Y Y n number of hidden neu A B S m number of voxels we o s number of input signa There are a few distinct ways of formulating optimization T timeticks of each exp problem of finding A, B. In next lines we show two of x(t) vector of neuron activ most insightful ones: y(t) vector of voxel activiti required that subjects consider meaning of Least word Squares in context of a semantic question. This task would (LS): This is a plausible explanation for deviations(t) of GEBM for vector Suboject inputsensor The require most straightforward recruitment of dorsal pathway (occipitalparietal approach #. is to express problem A [n n] connectivity matrix be as and a Least parietalfrontal Squares optimization: apple C connections). In addition, frontal involvement is indicated when 5. D: Brain Activity Simulation [m n] summarization matrix Y min task performed by subject requires selection of semantic A,B ky A B B An additional S way k [n s] perception matrix (sen F to gain confidence on our Amodel v is toconnectivity assess matrix be information [], as in our question answering paradigm. It is interesting that number of connections fromand parietal solve tofor occipital A B apple its ability to simulate/predict brain activity, given inferred functional (pseudo)inverting connectivity. In order Y by S. REAL part of a complex to do so, we trainedimag GEBM using imaginary data part of a co cortex is larger than from occipital to parietal, considering Canonical Correlation flow from Analysis all but (CCA): one of In words, CCA, and we n are we simulated A brainmoorepenrose activity Pseud of information is likely occipital to parietal. This solving could, for however, same objective timeseries function for as leftout in LS, word. with In lieu of competing methods, we be indicative of what is termed top down processing, additional constraint wherein that compare rank our of proposed A B has method to be GEBM equal against our initial approachtable : Table higher level cognitive processes can work to focus to r (and upstream typically sensory processes. Perhaps semantic task causes matrix subjects we aretosolving here r is much (whose smaller unsuitability than we dimensions have argued of for in Section, but we use for, in i.e. order we are to furr forcing solidify our case). As an initial state for focus in anticipation of upcoming word while keeping semantic question in mind. The final timeseries we show, both for GEBM, we use C solution y(), and for model. MODEL, we simply In particular, use y(). brain acti to be low rank). Similar to LS case, here we minimize simply data and collection estimated ones are normalized to unit norm, and of values rec sum of squared errors, however, solution here is low on plotted a person s in absolute scalp. rank, as opposed to LS solution which is (with very high values. For exposition purposes, we sorted voxels In MODEL according, we to attempt to mo 5. D: Crosssubject Analysis probability) full rank. l norm of ir time series vector, and we measurements are displaying directly. However, b However intuitive, formulation of MODEL In our experiments, we have 9 participants, all of whom have undergone same procedure, being presented with same stimuli, In Fig. 7 we illustrate simulated brain turns out to be high ranking ones (however, same pattern attention holds for to all an voxels) observable proxy o rar ineffective in capturing temporal dynamics of recorded to activity estimate of (i.e. GEBM functional co brain activity. As an example of its failure to model brain activity and asked to carry out same tasks. Availability of such a rich, (solid red), compared against ones of beneficial MODEL (using to LS model (dashdot magenta) and CCA (dashed black) ), as direct outco successfully, Fig. shows and predicted (using LS and multisubject dataset inevitably begs following question: are would well as like to original capture dynamics CCA) brain activity for a particular voxel (results by LS and CCA re any differences across people s functional connectivity? Or brain activity time series (dashed blue) for son s four highest brain, which, ranking in turn, cause t are similar to one in Fig. for all voxels). By minimizing is everyone, more or less, wired equally, at least with respect to with our MEG sensors. sum of squared errors, both voxels. algorithms Clearly, that solve activity for generated using GEBM is far more MODEL stimuli and tasks at hand? Let us assume that re are n hid istic than results of MODEL resort to a simple line that increases very slowly over time, thus. By using GEBM, we are able (to extent that our model is which interact with each or, cau having a minimal squared error, able to explain) to answer above question. We trained GEBM 5. givend: linearity Explanation assumptions. of Psychological in y. WePhenom ena activity size n. Then, by using sam denote vector of th Real and predicted MEG brain for each of 9 human subjects, using entire data from all. stimuli and tasks, and obtained matrices A, B, C for each person. As we briefly mentioned in Introduction, formulate we would like temporal our evolution o LS For purposes of answering above question, it suffices.5 to look proposed method tocca be able to capture some of psychological x(t +)=A [n n] x at matrix A (which is hidden functional connectivity),. since it phenomena that human brain exhibits. We, by Having no means, introduced claim above equ dictates temporal dynamics of brain activity. At this point, that GEBM is able to capture convoluted and still under heavy investigation psychological phenomena, however, in this section we.5 modelling underlying, hidden we have to note that exact location of each sensor can differ. serve. However, an issue that we ha between human subjects, however, we assume that this difference demonstrate GEBM s ability to simulate twox very is not basic observed phenomena, habituation and priming. Unlike previous pose to discoveries, resolve this issue by model and we have no.5 is negligible, given current voxel granularity dictated by. number of sensors. following experiments are on syntic data and itself, ir i.e. purpose modelis to transformation.5 In this multisubject study we have two very important findings: showcase GEBM s additional strengths. tor to its observed counterpart. We  Regularities: For 8 out of 9 human subjects, we identified 5 almost identical GEBM instances, both with respect to RMSE demand behaviour: Given a repeated stimulus, neurons ini 5 Habituation 5In our simplified 5 version of habituation, is linear, thus we observe we are able to write y(t) =C [m and to spectrum. In or words, for 8 out of 9 subjects in tially get activated, but ir activation levels decline (t = 6 in our study, inferred functional Figure connectivity : Comparison behaves almost identically. This fact most erated likely implies using thatls, for and CCA8). solutions In Fig. 8, towemodel show that. GEBM Clearly, is able toequations, capture such which behav constitute our pro of true brain Fig. activity 8) if and stimulus brain persists activity for gen a long time Putting (t = 8 everything Fig. toger, we particular set of stimuli and assorted MODEL tasks, is not human able brain to capture ior. In particular, trends of we show brain activity, desired input and output x(t for +) a few= A [n n] behaves similarly across people. and to end of minimizing (observed) squared error, voxels, produces and we show, an almost Top to bottom: (Left) Inputs, Outputs, Simulated Outputs (Righ given functional connectivity  Anomaly: One of our human subjects straight (#) line deviates that dissects from obtained brain activity according waveform. y(t) = C [m n] Top to bottom: (Left) Inputs, Outputs, Simulated Outputs (Rig to GEBM, simulated output, which exhibits.5 Additionally, we require hid.5 aforementioned regular behavior. same, desired behavior..5 trix.5 A to be sparse because, intuiti In Fig. 6(a) & (b) we show and imaginary parts of.5.5 brain interact directly.5 with eac eigenvalues of A. We can see that for 8. human subjects, Proposed eigenvalues are almost identical. This finding agrees with neuroscientific Top to bottom: (Left) Inputs, Outputs, Simulated Outputs (Right) Matrices A, C, B approach: GeBM formulation Desired Output! of GEBM, we.5 seek to o Formulating problem as MODEL results on different experimental settings [8], furr demonstrating GEBM s ability to provide useful insights on multisubject ex Desired is Input! not able to meet requirements for our desired solution. However, we have.5 not ex providing.5 more insightful and easy while obeying dynamics dictat.5 hausted space of possible formulations that live within our set tivity.5 matrices, since an exact zero.5 periments. For subject # re is a deviation on part of of simplifying assumptions. In this section, we describe GEBM, plicitly states that re is no direct.6 first eigenvalue, as well as a slightly deviating pattern on imaginary parts of its eigenvalues. Figures 6(c) & (d) compare matrix A Simulated Output! our proposed approach which, under assumptions 6 8 that.5 we.5 have.5 contrary, a very small value in already made in Section, is able to meet our requirements remarkably well. negligible.sparse) is ambiguous and could im. for subjects and. Subject negative value on diagonal (blue.5. and thus could.5 be ignore square at (8, 8) entry), a fact unique to this specific person s In order to come up with a more accurate model, it is useful to with very small weight between connectivity. look more carefully at actual system that 6 we 8 are attempting.5.5 to.5 The key ideas behind GEBM are Moreover, according to person responsible for data collection of Subject #:!of Hidden Neurons! Functional Connectivity A!.6.5. There was a big demonstration outside UPMC building during scan, and I remember subject complaining during one of breaks that he could hear crowd shouting through walls Figure 8: GEBM 6 8 captures Habituation: Given repeated exposure to a stimulus, brain activity starts to fade. Priming In our simplified model on priming, first we give stim
8 FL! " PL! Frontal lobe! (attention)! Parietal lobe! (movement)! 8 6 TL! 9 8 OL! Temporal lobe! Occipital lobe! (language)! (vision)! Figure 5: The functional connectivity derived from GEBM. The weights on edges indicate number of inferred connections. Our results are consistent with research that investigates natural language processing in brain. Eigenvalue.5.5 Real part of eigenvalues of A Subject # is an anomaly Subject # Subject # Subject # Subject # Subject #5 Subject #6 Subject #7 Subject #8 Subject #9 5 5 Eigenvalue index (a) Real part of eigenvalues Eigenvalue Occipital lobe! Frontal lobe! Vision! Attention, short memory! Temporal lobe! planning, motivation! Sensory input processing,! Parietal lobe! language comprehension,! Imaginary Movement! part of eigenvalues of A visual memory retention!.! Subject # is. an anomaly. Subject # Subject # Subject # Subject # Subject #5 Subject #6 Subject #7 Subject #8. 5 Subject #9 5 Eigenvalue index (b) Imaginary part of eigenvalues (c) Subject # (d) Subject # Figure 6: Multisubject analysis: Subfigures (a) and (b), show and imaginary parts of eigenvalues of matrix A for each subject. For all subjects but one (subject #) eigenvalues are almost identical, implying that GEBM that captures ir brain activity behaves more or less in same way. Subject # on or hand is an outlier; indeed, during experiment, subject complained that he was able to hear a demonstration happening outside of laboratory, rendering experimental task assigned to subject more difficult than it was supposed to be. Subfigures (c) and (d) show matrices A for subject # and #. Subject # s matrix seems sparser and most importantly, we can see that re is a negative entry on diagonal, a fact unique to subject # Real and predicted MEG brain activity GeBM LS CCA Figure 7: Effective brain activity simulation: Comparison of he brain activity and simulated ones using GEBM and MODEL, for first four high ranking voxels (in l norm sense). ulus apple, which sets off neurons that are associated with fruit apple, as well as neurons that are associated with Apple inc. Consequently, we are showing a stimulus such as ipod; this predisposes regions of brain that are associated with Apple inc. to display some small level of activation, whereas suppressing regions of brain that are associate with apple ( fruit). Later on, stimulus apple is repeated, which, given aforementioned predisposition, activates voxels associated with Apple (company) and suppresses ones associated with homonymous fruit. Figure 9 displays is a pictorial description of above example of priming; given desired input/output pairs, we derive a model that obeys GEBM, such that we match priming behavior. 6. RELATED WORK Brain Functional Connectivity Estimating brain s functional connectivity is an active field of study of computational neuroscience. Examples of works can be found in [, 8, 7]. There have been a few works in data mining community as well: In [6], authors derive brain region connections for Alzheimer s patients, and recently [5] that leverages tensor decomposition in order to discover underlying network of human brain. Most related to present work is work of Valdes et al [9], wherein authors propose an autoregressive model (similar to MODEL ) and solve it using regularized regression. However, to best of our knowledge, this work is first to apply system identification concepts to this problem.
9 Top to bottom: (Left) Inputs, Outputs, Simulated Outputs (Right) Matrices A, C, B S, and CCA solutions to MODEL. Clearly, equations, which constitute our proposed model GEBM: ble to capture trends of brain activity, x(t +) = A [n n] x(t) +B [n s] s(t) inimizing squared error, produces an almost.5 issects brain activity waveform. y(t) = C [m n] x(t) Additionally, we require hidden functional.5 connectivity matrixdesired A to be Output! sparse because, Acknowledgements intuitively, not all (hidden) regions of Top to bottom: brain (Left) interact Inputs, Outputs, directly Simulated Outputs with (Right) eachmatrices or. A, C, BThus, given above.5.5 Research was supported. by grants NSF IIS789, NSF IIS76, d approach: GeBM.5 formulation of GEBM,.5. Top to bottom: (Left) Inputs, Outputs, Simulated Outputs (Right) Matrices A, C, B we seek to obtain a matrix A sparse enough, NSF CDI 85797, NIH/NICHD 65, DARPA FA8755, IARPA problem as MODEL Desired is Input!.5. not able to meet rer desired solution. However, we have not ex providing while.5 obeying dynamics.5 dictated by. model. Sparsity is key in.5 FA865C76, and by Google. This work was also supported in part by more insightful and easy to interpret Top to bottom: functional connectivity A, (Left) Inputs, 6 8 Outputs, Simulated Outputs (Right) Matrices A, C, B.5.5 a fellowship to Alona Fyshe from Multimodal Neuroimaging Training of possible formulations Top to bottom: (Left) that Inputs, live Outputs, within Simulated our Outputs set (Right) Matrices Simulated C, B matrices, Output! since an exact zero on connectivity matrix explicitly.5 Program funded by NIH grants T9DA76 and R9DA. Any. Top to bottom: (Left) Inputs, Outputs, Simulated Outputs (Right) Matrices A, C, B umptions. In this section, we describe GEBM, states that re is no direct interaction..5 between neurons; on opinions, findings, and. conclusions or recommendations expressed in this.5. oach which, under.5 assumptions that we have.5 contrary, a very small value in matrix. (if matrix is not material are those of authors and do not necessarily reflect views of..5.6 ction, is able to meet our requirements remarke up with a.5more accurate model, it is useful to with very small weight between two neurons. sparse).5..5 is ambiguous and could imply Top to bottom: (Left) Inputs, Outputs, Simulated Outputs eir that interaction is (Right) Matrices A, C, B apple! funding parties. We would also like to thank Erika Laing and UPMC negligible 6 and 8 thus could.5 be ignored,.5.5or.5 that re indeed is a link ipod!.5. Brain Mapping Center for ir help with data and visualization of our apple (fruit)! results. Finally, we would like to thank Gustavo Sudre for collecting and ly at actual system that we are Functional.5.5 attempting Connectivity..5 to A! The Appleinc! key ideas behind.5gebm are: sharing data, and Dr. Brian Murphy for valuable conversations.!of.5hidden Neurons!.5 ipod! REFERENCES Figure 9: GEBM captures Priming:.5 When first shown stimulus apple, both neurons associated with.5 fruit apple and Apple [] J.R. Anderson. A spreading activation ory of memory Journal.5.5 of verbal learning and verbal behavior, inc. get activated. When showing stimulus.5. ipod and n apple, ipod predisposes neurons associated with Apple inc. to get ():6 95, [] Frederico AC Azevedo, Ludmila RB Carvalho, Lea T 6 8 activated more quickly, while suppressing ones. associated with.5. Grinberg, José Marcelo Farfel, Renata EL Ferretti, fruit Renata EP Leite, Roberto Lent, Suzana HerculanoHouzel, human brain an isometrically scaledup primate brain. Journal of Comparative Neurology, 5(5):5 5, Psychological Phenomena A concise overview of literature pertaining to habitutation can be found in [7]. A more recent study on habitutation can be found in []. The definition of priming, as we describe it in lines above concurs with definition found in [6]. Additionally, in [], authors conduct a study on effects of priming when human subjects were asked to write sentences. The above concepts of priming and habituation have been also studied in context of spreading activation [, ] which is a model of cognitive process of memory. Control Theory & System Identification System Identification is a field of control ory. In appendix we provide more oretical details on subspace system identification, however, [] and [] are most prominent sources for system identification algorithms. Network Discovery from Time Series Our work touches upon discovering underlying network structures from time series data; an exemplary work related to present paper is [] where authors derive a whocallswhom network from VoIP packet transmission time series. 7. CONCLUSIONS The list of our contributions is: Analytical model : We propose GEBM, a novel model of human brain functional connectivity. Algorithm: We introduce SPARSESYSID, a novel sparse system identification algorithm that estimates GEBM Effectiveness: GEBM simulates psychological phenomena (such as habituation and priming), as well as provides valuable neuroscientific insights. Validation: We validate our approach on syntic data (where ground truth is known), and on data, where our model produces brain activity patterns, remarkably similar to true ones. Multisubject analysis: We analyze measurements from 9 human subjects, identifying a consistent connectivity among 8 of m; we successfully identify an outlier, whose experimental procedure was compromised. et al. Equal numbers of neuronal and nonneuronal cells make [] Jeffrey R Binder and Rutvik H Desai. The neurobiology of semantic memory. Trends in cognitive sciences, 5():57 6, November. [] Allan M. Collins and Elizabeth F. Loftus. A spreadingactivation ory of semantic processing. Psychological Review, 8(6):7 8, 975. [5] Ian Davidson, Sean Gilpin, Owen Carmichael, and Peter Walker. Network discovery via constrained tensor analysis of fmri data. In ACM SIGKDD, pages 9. ACM,. [6] Angela D Friederici, Karsten Steinhauer, and Stefan Frisch. Lexical integration: Sequential effects of syntactic and semantic information. Memory & Cognition, 7():8 5, 999. [7] Alona Fyshe, Emily B Fox, David B Dunson, and Tom M Mitchell. Hierarchical latent dictionaries for models of brain activation. In AISTATS, pages 9,. [8] Michael D Greicius, Ben Krasnow, Allan L Reiss, and Vinod Menon. Functional connectivity in resting brain: a network analysis of default mode hyposis. PNAS, ():5 58,. [9] Gregory Hickok and David Poeppel. Dorsal and ventral streams: a framework for understanding aspects of functional anatomy of language. Cognition, 9():67 99,. [] Lennart Ljung. System identification. Wiley Online Library, 999. [] Martin J Pickering and Holly P Branigan. The representation of verbs: Evidence from syntactic priming in language production. Journal of Memory and Language, 9():6 65, 998. [] Catharine H Rankin, Thomas Abrams, Robert J Barry, Seema Bhatnagar, David F Clayton, John Colombo, Gianluca Coppola, Mark A Geyer, David L Glanzman, Stephen Marsland, et al. Habituation revisited: an updated and revised description of behavioral characteristics of habituation. Neurobiology of learning and memory, 9():5 8, 9. [] Vangelis Sakkalis. Review of advanced techniques for estimation of brain connectivity measured with eeg/meg.
10 Computers in biology and medicine, (): 7,. [] Ioannis D Schizas, Georgios B Giannakis, and ZhiQuan Luo. Distributed estimation using reduceddimensionality sensor observations. IEEE TSP, 55(8):8 99, 7. [5] G. Sudre, D. Pomerleau, M. Palatucci, L. Wehbe, A. Fyshe, R. Salmelin, and T. Mitchell. Tracking neural coding of perceptual and semantic features of concrete nouns. NeuroImage,. [6] Liang Sun, Rinkal Patel, Jun Liu, Kewei Chen, Teresa Wu, Jing Li, Eric Reiman, and Jieping Ye. Mining brain region connectivity for alzheimer s disease study via sparse inverse covariance estimation. In ACM SIGKDD, pages 5. ACM, 9. [7] Richard F Thompson and William A Spencer. Habituation: a model phenomenon for study of neuronal substrates of behavior. Psychological review, 7():6, 966. [8] Dardo Tomasi and Nora D Volkow. Resting functional connectivity of language networks: characterization and reproducibility. Molecular psychiatry, 7(8):8 85,. [9] Pedro A ValdésSosa, Jose M SánchezBornot, Agustín LageCastellanos, Mayrim VegaHernández, Jorge BoschBayard, Lester MelieGarcía, and Erick CanalesRodríguez. Estimating brain functional connectivity with sparse multivariate autoregression. Philosophical Transactions of Royal Society B: Biological Sciences, 6(57):969 98, 5. [] Michel Verhaegen. Identification of deterministic part of mimo state space models given in innovations form from inputoutput data. Automatica, ():6 7, 99. [] Michel Verhaegen and Vincent Verdult. Filtering and system identification: a least squares approach. Cambridge university press, 7. [] Olivier Verscheure, Michail Vlachos, Aris Anagnostopoulos, Pascal Frossard, Eric Bouillet, and Philip S Yu. Finding "who is talking to whom" in voip networks via progressive stream clustering. In IEEE ICDM, pages IEEE, 6. [] Robert W Williams and Karl Herrup. The control of neuron number. Annual review of neuroscience, (): 5, 988. APPENDIX A. SYSTEM IDENTIFICATION FOR GeBM Consider again linear statespace model of GEBM x(t + ) = Ax(t) + Bu(t), y(t) = Cx(t). Assuming x() =, for simplicity, it is easy to see that t x(t) = A t i Bu(i), i= and refore y(t) = t i= CAt i Bu(i), from which we can read out impulse response matrix H(t) = CA t B. As we mentioned in Section, matrices A, B, C can be identified up to a similarity transformation. In order to show this, suppose that we have access to system s inputs {u(t)} T t= and outputs {y(t)} T t=, and we wish to identify system matrices (A, B, C). A first important observation is following. From x(t + ) = Ax(t) + Bu(t) we obtain Mx(t + ) = MAx(t) + MBu(t) = MAM Mx(t) + MBu(t). Defining z(t) := Mx(t), Ã := MAM, and B := MB, we obtain z(t + ) = Ãz(t) + Bu(t), and with C := CM, we also have y(t) = Cx(t) = CM Mx(t) = Cz(t). It follows that (A, B, C) and (Ã, B, C) = (MAM, MB, CM ) are indistinguishable from inputoutput data alone. Thus, sought parameters can only be (possibly) identified up to a basis (similarity) transformation, in absence of any or prior or side information. Under what conditions can (A, B, C) be identified up to such similarity transformation? It has been shown by Kalman that if socalled controlability matrix C := [ B, AB, A B,, A n B ] is full row rank n, and observability matrix O := [ C CA CA CA n ] T is full column rank n, n it is possible to identify (A, B, C) up to such similarity transformation from (sufficiently diverse ) inputoutput data, and impulse response in particular. This can be accomplished by forming a block Hankel matrix out of impulse response, as follows, H() H() H() H(n) H() H(). H(n) H(n + ) = CB CAB CA B CA n B CAB CA B. CA n B CA n B. This matrix can be factored into OMM C, i.e., true O and C up to similarity transformation, using singular value decomposition of above Hankel matrix. It is n easy to recover Ã, B, C from OM and M C. This is core of Kalman Ho algorithm for Hankel subspacebased identification []. This procedure enables us to identify Ã, B, C, but, in our context, we are ultimately also interested in true latent (A, B, C). PROPOSITION. We can always, without loss of generality, transform Ã to maximal sparsity while keeping B and C dense  that is, sparsity of A alone does not help in terms of identification. PROOF. Suppose that we are interested in estimating a sparse A, while preserving eigenvalues of of Ã. We refore seek a similarity transformation (a matrix M) that will render A = MÃM as sparse as possible. Towards this end, assume that Ã has a full set of linearly independent eigenvectors, collected in matrix E, and let Λ be corresponding diagonal matrix of eigenvalues. Then, clearly, ÃE = EΛ, and refore E ÃE = Λ  a diagonal matrix. Hence choosing M = E we make A = MÃM maximally sparse. Note that A must have same rank as Ã, and if it has less nonzero elements it will also have lower rank. Finally, it is easy to see that if we apply this similarity transformation, eigenvalues of Ã do not change.
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
More informationNUMERICALLY ESTIMATING INTERNAL MODELS OF DYNAMIC VIRTUAL OBJECTS
NUMERICALLY ESTIMATING INTERNAL MODELS OF DYNAMIC VIRTUAL OBJECTS G. ROBLESDELATORRE International Society for Haptics and R. SEKULER Center for Complex Systems, Brandeis University Precise manipulation
More informationGraphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations Jure Leskovec Carnegie Mellon University jure@cs.cmu.edu Jon Kleinberg Cornell University kleinber@cs.cornell.edu Christos
More informationDude, Where s My Card? RFID Positioning That Works with Multipath and NonLine of Sight
Dude, Where s My Card? RFID Positioning That Works with Multipath and NonLine of Sight Jue Wang and Dina Katabi Massachusetts Institute of Technology {jue_w,dk}@mit.edu ABSTRACT RFIDs are emerging as
More informationRobust Set Reconciliation
Robust Set Reconciliation Di Chen 1 Christian Konrad 2 Ke Yi 1 Wei Yu 3 Qin Zhang 4 1 Hong Kong University of Science and Technology, Hong Kong, China 2 Reykjavik University, Reykjavik, Iceland 3 Aarhus
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationStreaming Submodular Maximization: Massive Data Summarization on the Fly
Streaming Submodular Maximization: Massive Data Summarization on the Fly Ashwinkumar Badanidiyuru Cornell Unversity ashwinkumarbv@gmail.com Baharan Mirzasoleiman ETH Zurich baharanm@inf.ethz.ch Andreas
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,
More informationLearning Deep Architectures for AI. Contents
Foundations and Trends R in Machine Learning Vol. 2, No. 1 (2009) 1 127 c 2009 Y. Bengio DOI: 10.1561/2200000006 Learning Deep Architectures for AI By Yoshua Bengio Contents 1 Introduction 2 1.1 How do
More informationWas This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content
Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content Ruben Sipos Dept. of Computer Science Cornell University Ithaca, NY rs@cs.cornell.edu Arpita Ghosh Dept. of Information
More informationFollow the Green: Growth and Dynamics in Twitter Follower Markets
Follow the Green: Growth and Dynamics in Twitter Follower Markets Gianluca Stringhini, Gang Wang, Manuel Egele, Christopher Kruegel, Giovanni Vigna, Haitao Zheng, Ben Y. Zhao Department of Computer Science,
More informationScalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
More informationSpeeding up Distributed RequestResponse Workflows
Speeding up Distributed RequestResponse Workflows Virajith Jalaparti (UIUC) Peter Bodik Srikanth Kandula Ishai Menache Mikhail Rybalkin (Steklov Math Inst.) Chenyu Yan Microsoft Abstract We found that
More informationDetecting LargeScale System Problems by Mining Console Logs
Detecting LargeScale System Problems by Mining Console Logs Wei Xu Ling Huang Armando Fox David Patterson Michael I. Jordan EECS Department University of California at Berkeley, USA {xuw,fox,pattrsn,jordan}@cs.berkeley.edu
More informationSo Who Won? Dynamic Max Discovery with the Crowd
So Who Won? Dynamic Max Discovery with the Crowd Stephen Guo Stanford University Stanford, CA, USA sdguo@cs.stanford.edu Aditya Parameswaran Stanford University Stanford, CA, USA adityagp@cs.stanford.edu
More informationIntroduction to Data Mining and Knowledge Discovery
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More informationIndexing by Latent Semantic Analysis. Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637
Indexing by Latent Semantic Analysis Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637 Susan T. Dumais George W. Furnas Thomas K. Landauer Bell Communications Research 435
More informationCollective Intelligence and its Implementation on the Web: algorithms to develop a collective mental map
Collective Intelligence and its Implementation on the Web: algorithms to develop a collective mental map Francis HEYLIGHEN * Center Leo Apostel, Free University of Brussels Address: Krijgskundestraat 33,
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationFeedback Effects between Similarity and Social Influence in Online Communities
Feedback Effects between Similarity and Social Influence in Online Communities David Crandall Dept. of Computer Science crandall@cs.cornell.edu Jon Kleinberg Dept. of Computer Science kleinber@cs.cornell.edu
More informationLike What You Like or Like What Others Like? Conformity and Peer Effects on Facebook. Johan Egebark and Mathias Ekström
IFN Working Paper No. 886, 2011 Like What You Like or Like What Others Like? Conformity and Peer Effects on Facebook Johan Egebark and Mathias Ekström Research Institute of Industrial Economics P.O. Box
More informationEvery Good Key Must Be A Model Of The Lock It Opens. (The Conant & Ashby Theorem Revisited)
Every Good Key Must Be A Model Of The Lock It Opens (The Conant & Ashby Theorem Revisited) By Daniel L. Scholten Every Good Key Must Be A Model Of The Lock It Opens Page 2 of 45 Table of Contents Introduction...3
More informationA GENTLE INTRODUCTION TO SOAR,
A GENTLE INTRODUCTION TO SOAR, AN ARCHITECTURE FOR HUMAN COGNITION: 2006 UPDATE JILL FAIN LEHMAN, JOHN LAIRD, PAUL ROSENBLOOM 1. INTRODUCTION Many intellectual disciplines contribute to the field of cognitive
More informationIncreasingly, usergenerated product reviews serve as a valuable source of information for customers making
MANAGEMENT SCIENCE Vol. 57, No. 8, August 2011, pp. 1485 1509 issn 00251909 eissn 15265501 11 5708 1485 doi 10.1287/mnsc.1110.1370 2011 INFORMS Deriving the Pricing Power of Product Features by Mining
More informationTHE development of methods for automatic detection
Learning to Detect Objects in Images via a Sparse, PartBased Representation Shivani Agarwal, Aatif Awan and Dan Roth, Member, IEEE Computer Society 1 Abstract We study the problem of detecting objects
More informationChapter 18. Power Laws and RichGetRicher Phenomena. 18.1 Popularity as a Network Phenomenon
From the book Networks, Crowds, and Markets: Reasoning about a Highly Connected World. By David Easley and Jon Kleinberg. Cambridge University Press, 2010. Complete preprint online at http://www.cs.cornell.edu/home/kleinber/networksbook/
More informationA First Encounter with Machine Learning. Max Welling Donald Bren School of Information and Computer Science University of California Irvine
A First Encounter with Machine Learning Max Welling Donald Bren School of Information and Computer Science University of California Irvine November 4, 2011 2 Contents Preface Learning and Intuition iii
More informationThe neuronal dynamics underlying cognitive flexibility in set shifting tasks
J Comput Neurosci (2007) 23:313 331 DOI 10.1007/s108270070034x The neuronal dynamics underlying cognitive flexibility in set shifting tasks Anja Stemme Gustavo Deco Astrid Busch Received: 5 August 2006
More information