Multivariate time series analysis: Some essential notions



Similar documents
The EOQ Inventory Formula

FINITE DIFFERENCE METHODS

How To Ensure That An Eac Edge Program Is Successful


Verifying Numerical Convergence Rates

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION

2 Limits and Derivatives

Derivatives Math 120 Calculus I D Joyce, Fall 2013

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

Distances in random graphs with infinite mean degrees

2.23 Gambling Rehabilitation Services. Introduction

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

In other words the graph of the polynomial should pass through the points

Geometric Stratification of Accounting Data

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

Research on the Anti-perspective Correction Algorithm of QR Barcode

An inquiry into the multiplier process in IS-LM model

CHAPTER 7. Di erentiation

Cyber Epidemic Models with Dependences

Probability and Random Variables. Generation of random variables (r.v.)

The modelling of business rules for dashboard reporting using mutual information

Schedulability Analysis under Graph Routing in WirelessHART Networks

ACT Math Facts & Formulas

Projective Geometry. Projective Geometry

Tangent Lines and Rates of Change


- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring Handout by Julie Zelenski with minor edits by Keith Schwarz

Notes: Most of the material in this chapter is taken from Young and Freedman, Chap. 12.

Channel Allocation in Non-Cooperative Multi-Radio Multi-Channel Wireless Networks

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

Instantaneous Rate of Change:

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

Equilibria in sequential bargaining games as solutions to systems of equations

Matrices and Polynomials

Theoretical calculation of the heat capacity

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

Chapter 8 - Power Density Spectrum

Welfare, financial innovation and self insurance in dynamic incomplete markets models

SAT Subject Math Level 1 Facts & Formulas

Chapter 6 Tail Design

Univariate and Multivariate Methods PEARSON. Addison Wesley

College Planning Using Cash Value Life Insurance

Optimized Data Indexing Algorithms for OLAP Systems

TRADING AWAY WIDE BRANDS FOR CHEAP BRANDS. Swati Dhingra London School of Economics and CEP. Online Appendix

A New Cement to Glue Nonconforming Grids with Robin Interface Conditions: The Finite Element Case

Math 113 HW #5 Solutions

Training Robust Support Vector Regression via D. C. Program

A system to monitor the quality of automated coding of textual answers to open questions

A Multigrid Tutorial part two

On a Satellite Coverage

Chapter 7 Numerical Differentiation and Integration

Catalogue no XIE. Survey Methodology. December 2004

MULTY BINARY TURBO CODED WOFDM PERFORMANCE IN FLAT RAYLEIGH FADING CHANNELS

1 The Collocation Method

MATHEMATICAL MODELS OF LIFE SUPPORT SYSTEMS Vol. I - Mathematical Models for Prediction of Climate - Dymnikov V.P.

Chapter 11. Limits and an Introduction to Calculus. Selected Applications

Chapter 10 Introduction to Time Series Analysis

Pre-trial Settlement with Imperfect Private Monitoring

1.6. Analyse Optimum Volume and Surface Area. Maximum Volume for a Given Surface Area. Example 1. Solution

Part II: Finite Difference/Volume Discretisation for CFD

Model Quality Report in Business Statistics

SAT Math Must-Know Facts & Formulas

Bonferroni-Based Size-Correction for Nonstandard Testing Problems

FINANCIAL SECTOR INEFFICIENCIES AND THE DEBT LAFFER CURVE

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

Solutions by: KARATUĞ OZAN BiRCAN. PROBLEM 1 (20 points): Let D be a region, i.e., an open connected set in

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions

Time Series Analysis in WinIDAMS

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Note nine: Linear programming CSE Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1

f(x + h) f(x) h as representing the slope of a secant line. As h goes to 0, the slope of the secant line approaches the slope of the tangent line.

1 Short Introduction to Time Series

Free Shipping and Repeat Buying on the Internet: Theory and Evidence

Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters

Chapter 12 Modal Decomposition of State-Space Models 12.1 Introduction The solutions obtained in previous chapters, whether in time domain or transfor

Writing Mathematics Papers

2.12 Student Transportation. Introduction

New Vocabulary volume

Math Test Sections. The College Board: Expanding College Opportunity

Inner Product Spaces and Orthogonality

6. Differentiating the exponential and logarithm functions

Pressure. Pressure. Atmospheric pressure. Conceptual example 1: Blood pressure. Pressure is force per unit area:

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Tis Problem and Retail Inventory Management

His solution? Federal law that requires government agencies and private industry to encrypt, or digitally scramble, sensitive data.

Shell and Tube Heat Exchanger

Transcription:

Capter 2 Multivariate time series analysis: Some essential notions An overview of a modeling and learning framework for multivariate time series was presented in Capter 1. In tis capter, some notions on multivariate time series analysis in time and frequency domains are succinctly introduced; tools and conventions used erein are essential to appreciate te contributions later in te tesis. Altoug tey are widely available in textbooks, tey ave been adapted appropriately to suit tis tesis. In Section 2.1, a multivariate time series model and te concept of weak stationarity are formally defined; only tose time series tat are weakly stationary are considered trougout tis tesis. Weak stationarity requires defining te autocovariance function of a time series. Autocovariance caracteristics of a few relevant types of stationary multivariate time series are presented. In Section 2.2, frequency or spectral domain concepts belonging to Fourier analysis are introduced. Te motion of a simple pendulum is used as an example to motivate te presentation of Fourier series; tisissubsequentlyextendedtotefourier transform. Clearly,tisexampletointroduceFourieranalysisisadetour to a continuous time process; butitwillenanceunderstandingofspectraldomaintools,notations, and definitions. In Section 2.3, te discrete time process is introduced as a limiting case of continuous time processes; tis leads to discrete time Fourier transform. Discretetime Fourier transform gives a periodic and continuous spectrum and it underpins important developments in subsequent sections. Tere, discrete Fourier transform of a discrete time process is also discussed. In Section 2.4, after aving defined time and spectral domain caracteristics of adeterministicprocess,spectralanalysisofstationarytime series is presented. Te two most important ideas tat need to be taken from tis capter are presented next: First, te relation between autocovariance function and spectral density function is simply tat of a Fourier transform. Second, te probability distribution of discrete Fourier transform components of a linear process is complex-gaussian witin small subbands of frequencies. Te first idea is a direct application of Fourier analysis derived in earlier sections. For te second idea, te asymptotic teory of spectral 25

estimates is involved. In Section 2.5, terefore, te asymptotic probability distribution function of discrete Fourier transform components is provided witout proof. 2.1 Temporal analysis of stationary processes In tis section, an introductory review of te time-domain or temporalanalysisof time series is performed. It starts by adapting some definitions from te literature of random processes [29, 68, 86, 47]; te presented definitions migt be termed differently by various autors elsewere in te literature. An infinite sequence of random variables forms a random process. Ten, a vector random process is defined as an infinite sequence of vector random variables tat is a set of random variables maintaining te same order of te vector components in every realization. Definition 2.1. A(multivariate) time series of a (vector) random process is a connected subsequence formed by its constituent (vector) random variables. Atimeseriesiscalledsobecauseteindexoftesequenceisoften attributed to time instants. If y t is te random variable at time instant t Z of a random process of interest, ten {y t }, t =1,...,τ sall be called a τ-lengt realization of a time series wose t-t sample is y t.anr-dimensional vector random process generates an infinite sequence {y t } of r-dimensional vector random variables y t =[y 1t y rt ],werey it, i =1,...,r are te component random variables at time instant t Z. Figure 2.1 selects a realization of a τ-lengt r-variate time series wose t-t sample is te vector y t R r. It may now be implied tat wen referring to te term process incapter1ina broad sense, it meant te vector random process underlying a multivariate time series. On similar lines, te term model tere referred to te joint probabilitydistribution function of te samples of te time series; tis is because it is a set of random variables tat is dealt wit. Ten, te model of a process corresponding to a τ-lengt r- variate time series requires evaluating te τ r-dimensional joint probability distribution function P(y 1 c 1,..., y τ c τ ) for any constant vector c t R r, t = 1,...,τ, were P denotes probability and te comparison of vectors are component-wise. Of course, direct evaluation of suc a probability distribution is very unwieldy. Terefore, restricting te scope of te studies and bringing fort assumptions to simplify te process is inevitable for modeling a process generating a multivariate time series. Let a few useful terms associated wit random variables be first defined [95]. Definition 2.2. Te probability density function p u of a random variable u is defined as p u (a) = d dap(u a) a R, werevertederivativeexists. In te above definition, p u (a) is any positive finite real number werever te derivative does not exist. Ten te joint probability density function p u 1,...,u r of r random variables u 1,...,u r may be given by p u 1,...,u r (a 1,...,a r )= r P(u 1 a 1,...,u r a r ) / a 1 a r a =[a 1 a r ] R r and p u 1,...,u r (a 1,...,a r ) is any positive finite real number werever te derivative does not exist. 26

2 y 2 {y t }, t =1,...,τ realization of 1st random process 1 0 1 2 1 realization of 2nd random process 0 1. 2 1 realization of rt random process 0 1 t 1 0 1 2 3 4 5 τ τ +1 Figure 2.1: Te second sample y 2 of a realization of a τ-lengt r-variate time series {y t } is igligted. Definition 2.3. Te multivariate probability density function p u of an r- dimensional vector random variable u = [u 1 u r ] is defined as te joint probability density function of its r component random variables, i.e., p u (a) = p u 1,...,u r (a 1,...,a r ) a =[a 1 a r ] R r. Definition 2.4. For an r-dimensional vector random variable u wose probability density function p u (b) exists b R r,teexpectation of a function g(u) wit respect to p u is defined as E u [g(u)] = g(b) pu (b) db. Definition 2.5. Te mean µ u R r of an r-dimensional vector random variable u is defined as its expectation wit respect to its r-variate probability density function, i.e., µ u = E u [u]. 27

Definition 2.6. Te variance (covariance matrix) Γ u of te (r- dimensional vector) random variable u is defined as te expectation, wit respect to its (multivariate) probability density function p u,ofte(outer)productofte(vector) random variable wit itself about its mean µ u,i.e.,γ u = E u [(u µ u )(u µ u ) ]. Definition 2.7. Te cross-covariance Γ u,v between te (vector) random variables u and v is defined as te expectation, wit respect to teir joint (multivariate) probability density function p u,v,ofte(outer)productofterandomvariables about teir respective means µ u and µ v,i.e.,γ u,v = E u,v [(u µ u )(v µ v ) ]. Definition 2.8. Te cross-covariance between any two constituent (vector) random variables of te same (vector) random process is called te autocovariance function between te (vector) random variables. Terefore, te autocovariance function (acvf) betweenter- dimensionalvectorrandom variables y t and y s t, s Z is (2.1) Γ yt,ys = E yt,ys [(y t µ yt )(y s µ ys ) ]. Note 2.1. Wen it specifically concerns a univariate time series {y t } and not a multivariate time series, its acvf will be denoted by γ yt,ys.ten,foramultivariate time series {y t } = {[y 1t y 2t y rt ] },te(i, j)-t element of its acvf Γ yt,ys may be written as γ y i t,y js,wicaccordingtodefinition2.7,maybeinterpretedaste cross-covariance between y it and y js i, j 1,...,r and t, s Z. Definition 2.9. A(vector)timeseries{y t } t Z is weakly stationary if te mean µ yt is a constant (vector) µ y and its acvf Γ yt,ys between te (vector) random variables y t and y s s Z depends on s and t only troug s t. Te variable = s t of te acvf Γ yt,ys will be referred to as te lag. Itfollowsfrom Definition 2.9 tat te acvf between y t+ and y t of a weakly stationary time series {y t } is (2.2) Γ y Γy t+,y t = E y t+,y t [(y t+ µ y )(y t µ y ) ] Z. It is easy to verify tat γ y i,y j acvf : = γ y j,y i,wicgivesrisetotefollowingpropertyofte Property 2.1. Aweaklystationaryacvfistransposesymmetricabout =0,i.e., (2.3) (Γ y ) =Γ y. In tis tesis, te focus is on time series tat are weakly stationary and te main references on tat topic are [102, 99, 111, 19, 20]. Now, take alookatafewexamples of weakly stationary multivariate time series. 28

Property 2.2. Aweaklystationaryr-variate time series {z t } is idiosyncratic if any two components z it and z jt+ Z, i j of its corresponding vector random variable z t = [z 1t z rt ] t Z ave zero cross-covariance, i.e., γ z i t,z jt+ γ z i,z j R is zero wenever i j i, j 1,...,r and Z. Note 2.2. Te diagonal elements of an acvf Γ u of te vector random variable u t =[u 1t u 2t u rt ] will be written simply as γ u i γu i,u i i 1,...,r. Te acvf of an r-variate idiosyncratic time series {z t } due to vector random variable γ z 1 γ z 2,z 1 =0 γ z 1,z 2 =0 γ z 1,z r =0 γ z 2 γ z 2,z r =0 γ z r γ z r,z 1 =0 γ z r,z 2 =0 Figure 2.2: Te structure of te r r acvf matrix Γ z of an r-variate idiosyncratic time series {z t } sows zeros off-diagonal due to no cross-correlation betweenitscomponents. Te plots are ypotetical and interpolated for, an oterwisediscrete, forclarity. z t =[z 1t... z rt ] as an r r matrix structure sown in Figure 2.2. Following Note 2.2, te cross-covariance γ z i,z j of Property 2.2 may be written as te acvf γ z i of z i wenever i = j, wereasγ z i,z j =0is zero oterwise. Tis means tat te off-diagonal elements of suc an acvf are always zero. Let a special case of an idiosyncratic time series wose eac diagonal element of te acvf is an impulse function be now defined. Definition 2.10. For a weakly stationary (vector) time series {x t },if(tecomponents of) x t are independently and identically distributed t Z, ten{x t } is said to be (multivariate) wite noise. Note tat te acvf Γ x of a q-variate wite noise {x t} is Γ x = 0 q 0 and det(γ x 0 ) 0. Definition 2.10 implies tat mean-subtracted wite noise components may be defined completely by teir component variances σi 2 = σi 2 i = 1,...,q so tat Γ x = diag(σ2 1,...,σ2 q ) Z, wicissaidtobeisotropic if σ i = σ, 29

i =1, 2,...,q.Aspecialcaseoftezero-meanisotropicwitenoiseistefollowing: If te component random variables of a q-variate wite noise {x t } ave zero means and unit variances so tat Γ x = diag(1,...,1) Rq q Z, ten{x t } is termed a zero-mean unit-variance wite noise. Te wite noise is used as te ingredient in many weakly stationary time series process models because of te simplicity of its acvf. Defined below is one suc model; refer 11.1 of[20] or 9.2 of[79] among many references for its details. Definition 2.11. For a q-variate zero-mean wite noise {x t } t Z and matrices C j R q q j Z wit absolutely summable elements, a linear process is defined as te q-variate time series (2.4) u t = j Z C j x t j, wic is weakly stationary wit zero mean and acvf (2.5) Γ u = j Z C j+ Γ x 0C j. 2.2 Spectral analysis of continuous processes Te purpose of tis section is to introduce certain Fourier analysis concepts required for tis tesis. Note 2.3. Tis section deviates momentarily to discuss continuous time processes; te time index is t R; everywereelseintistesist Z. Consider te motion of a simple pendulum as an example of a periodic continuous process. It is assumed for simplicity tat te mass of te string attaced to te bob of te pendulum is negligible. Te oscillation is restricted toaplanesotatte constant string lengt and an angle, viz., te instantaneous angletattestringforms wit respect to its equilibrium position, are sufficient to describe its motion. It is also assumed tat te amplitude α, wicistemaximumdisplacementoftebobfrom its position of equilibrium, is very small relative to te lengt of te pendulum. Refer to Figure 2.3; let τ be te time period of oscillation so tat τ 1 is te frequency of oscillation. Te standard association of 2π radians to be equivalent to one complete oscillation may be made. Let φ radians be te part of 2π radians of an oscillation te pendulum as completed at time t =0;itssigndependsontecoiceofdirectionof reference of te bob s trajectory. If it is assumed tat te pendulum is undamped by any kinds of friction and disturbances, ten te pendulum s displacement wit respect to te equilibrium position of te string at time t R is y t = α cos(2π t τ + φ). Basic trigonometric identities enable writing y t in various combinations of sinusoids, e.g., (2.6) y t = a cos(2πt/τ)+b sin(2πt/τ), were a = α cos(φ) and b = α sin(φ). 30

y t α t =0 π 2τ 0 t φ π 2τ τ Figure 2.3: Te motion of te simple pendulum registers a continuous function based on te displacement of its bob. Just seen is te decomposition of a basic equation of oscillation into two sinusoids of frequency τ 1.Considertaty t was expressed as a weigted sum of two basis functions; tis is because te sinusoids ere are ortogonal functions, i.e., 1 2 τ cos(2πt/τ) 1 2 τ sin(2πt/τ) dt =0. Te necessity of ortogonal functions in many problems is analogous to te necessity of ortogonal coordinate axes in expressing te position of a point in a Cartesian plane. Te decomposition of a time-domain process to its frequency components is known as Fourier (spectral) analysis and te definitions presented in tis and Section 2.3 on tis topic can be found in references suc as [99, 44, 93, 89]. Fourier analysis is based on one of te most important contributions to te sciences originally formalized by Josep Fourier in 1807 tat any well-beaved deterministic continuous periodic function y t could be expressed as a sum of ortogonal functions if and only ifteortogonal functions are sinusoids, were a well-beaved function satisfies te following condition: Definition 2.12. Afunctiony t t R is said to be absolutely summable if (2.7) y t dt <. Te unique decomposition of suc a deterministic periodic function y t into a possibly infinite number of sinusoids is called its Fourier series representation: y t = a 0 2 + n=1 (a n cos(2πnt/τ) +b n sin(2πnt/τ)), werea m = 2 1 2 τ τ y 1 2 τ t cos(2mπt/τ) dt, m =0, 1, 2,... and b l = 2 1 2 τ τ y 1 2 τ t sin(2lπt/τ) dt, l =1, 2, 3,... It is often con- 31

venient to use Euler identity e iθ = cos(θ) +isin(θ) to reac te following definition wic olds for complex-valued functions also. Definition 2.13. Te Fourier series representation of a deterministic continuous periodic function y t C t R satisfying (2.7) is (2.8) y t = were c m C m Z is m= c m e i2πmt/τ (2.9) c m = 1 τ 1 2 τ 1 2 τ y t e i2πmt/τ dt. Note tat if c m = c m m Z, tentefunctiony t R, elsey t C. Involve frequency spacing u = 2π/τ and write te Fourier series coefficients as c m = u 2π 1 2 τ y 1 2 τ t e imt u dt. Substituting tese coefficients back into te series 1 2 τ y te int u dte imt u.suppose 1 2 τ summation gives y t = m= u 2π y(n u) = 1 2 τ 1 2 τ y t e int u dt, ten y t = m= u 2π y(n u) eimt u.asτ or u 0, y t = 1 2π u= y(u)e itu du. Tis is regarded as te inverse relation of a very important transform in matematics tat is defined below. Te term y(u) in te above development is te result of limiting u 0 in y(n u) and acquires te following definition: Definition 2.14. Fourier transform of a function y t t R satisfying (2.7) is (2.10) y(u) = y t e itu dt. 2.3 Spectral analysis of discrete processes In te Fourier transform relation of (2.10), a continuous function defined for t R is dealt wit. Consider te continuous time function y t t R suc tat y t =0wenever t m τ m Z for some constant τ > 0. Tis is equivalent to sampling te continuous function y t t R at discrete instants separated by τ and zero at all oter instants. Since only discrete instants are relevant ere from te Fourier transform 32

perspective, y t would be called a discrete time function. Terefore, te discrete time Fourier transform using (2.10) becomes y(u) = m= y m τ e ium τ. Denote y m y m τ and refer to it as te m-t sample. Ten a sufficient condition for te existence of suc a relation is y(u) <, i.e., m= y me ium τ m= y m e ium τ <. Tisresultsinteabsolutesummabilitycondition (2.11) m= y m <. Using angular frequency ω as u τ =2πω in te above development results in te following definition of Fourier transform for discrete time functions. Definition 2.15. Te discrete time Fourier transform of a complex-valued discrete function y m m Z satisfying (2.11) is (2.12) y(ω) = m= y m e i2πωm. Altoug real-valued discrete functions were being discussed, te discrete time Fourier transform is valid for complex-valued functions also. Furtermore, since e i2πk =1 k Z, tefollowingpropertyolds: Property 2.3. Te discrete time Fourier transform as unit periodicity, i.e., (2.13) y(ω) =y(k + ω) k Z. Anoter easily verifiable property, wic olds true for any absolutely summable discrete or continuous function, is due to te following teorem; refer 22.1 of [41] or Capter 3 of [72]: Teorem 2.1. According to te Plancerel-Parseval teorem for te discrete time Fourier transform y(ω) ω [ 1 2, 1 2 ] of te function y m m Z, (2.14) y m 2 = m Z 1 2 1 2 y(ω) 2 dω. Just as te discrete time Fourier transform was defined being valid for complex-valued discrete functions, te Fourier series discussed earlier in (2.8) and(2.9) isapplicable to any complex valued periodic function defined over any continuous domain. Hence, replacing (y, t τ ) (y,ω) in (2.8) makes it equivalent to (2.12). In oter words, te discrete time Fourier transform of a sequence of equally spaced samples of a real function is also a Fourier series wose coefficients form te sequence. Terefore, allowing 33

te same replacement in (2.9) gives te differential 1 τ dt dω and te integral limits t = ± 1 2 τ ω = 1 2 resulting in te following inverse of te relation in (2.12): Definition 2.16. Te inverse discrete time Fourier transform of a complexvalued continuous function y(ω) ω R is defined as (2.15) y m = 1 2 1 2 e i2πmω y(ω)dω. Te Fourier series gave discrete frequency components of a continuous time process and te discrete time Fourier transform gave continuous frequency components of a discrete time process. On te oter and, te following transform gives discrete frequency components of a finite realization of a discrete time process: Definition 2.17. Te discrete Fourier transform of a series {y t }, t =1,...,τ is defined as (2.16) y(ω j )= 1 τ τ y t e i2πω jt t=1 at discrete frequencies ω j = j τ, j =0,...,τ 1 and te inverse discrete Fourier transform at te discrete instants is defined as (2.17) y t = 1 τ τ y(ω j )e i2πωjt. t=1 Te equivalent of Teorem 2.1 for te discrete Fourier transform is as follows [17]: Property 2.4. According to te Plancerel-Parseval teorem for te discrete Fourier transform y(ω j ) of te sequence y t j, t =1,...,τ, (2.18) τ τ y t 2 = y(ω j ) 2. t=1 j=1 In tis tesis, for a given finite lengt realization of a multivariate time series, certain asymptotic properties of te discrete Fourier transform will be used to define, derive, and optimize te dynamic transformation of te latent time series into commonalities. Tese asymptotic properties will be discussed in Section 2.5. Te Plancerel-Parseval teorem will enable measuring and containing te commonalities tat are retained during te transition between te time-domain and te frequency-domain. In Section 2.4, ow te frequency-domain analysis finds utility in a stationary process will be discussed. Specifically, in Teorem 2.2, it will be learned ow te Fourier transform relates two important statistical properties of a time series. 34

2.4 Spectral analysis of stationary processes Suppose te pendulum motion expressed in (2.6) is subject to random amplitude α and pase φ disturbances so tat (a, b) become uncorrelated zero-mean unit-variance random variables (a, b). Moreover,tediscretetimedomainisconsideredsotatte equation of motion in (2.6) takes te form y t = acos(2πωt) +bsin(2πωt) t Z; it will be called te perturbed pendulum. Its mean µ y = E a,b [y t ] = 0. Te acvf is γ y = Ea,b [y t+ y t ], wic due to non-correlated a and b takes te form γ y = Ea,b [a 2 cos(2πω(t + )) cos(2πωt)] + E a,b [b 2 sin(2πω(t + )) sin(2πωt)] Z. Since it was assumed tat E a [a 2 ]=E b [b 2 ]=1,watonegets 1 is γ y = cos(2πω); spectral analysis of suc an acvf could be found in [111, 19]. Since µ y and γ y are independent of t, itisfoundtat{y t } is weakly stationary. And, since weakly stationary {y t } does not satisfy (2.11), its Fourier transform simply does not exist. Using Euler s identity, te acvf of te perturbed pendulum is written as a summation γ y = k i=1 α ig(ω i ),wereg(ω) =e i2πω, k =2, ω 1 = ω, ω 2 = ω, and α 1 = α 2 = 1 2. But suc a summation wit a general g(ω) as an integral representation k i=1 α ig(ω i )= g(ω) ds y (ω), weres y (ω) k i=1 α i 1(ω i ω) is a monotonically increasing function bounded between S y ( ) =0and S y ( ) =1, and 1(ω i ω) is te step function wic jumps from zero to unity at ω = ω i. However, due to periodicity of g(ω) =e i2πω in te above example of a perturbed pendulum, te acvf is essentially represented in te integral form γ y = 1 2 1 2 e i2πω ds y (ω), weres y (ω) is a monotonically increasing function bounded between [ 1 2, 1 2 ] wile S y ( 1 2 )=0and Sy ( 1 2 )=γy 0.Tereaderisreferredto[99,20]fortedetails of tis representation and oter properties tat S y (ω) aderes to. Te notion carried forward is tat wenever te derivate s y (ω) = d dω Sy (ω) exists, it is possible to write te acvf as (2.19) γ y = 1 2 1 2 e i2πω s y (ω) dω. But if tere are discontinuities in S y (ω), e.g.,teperturbedpendulum,itwillnotbe possible to write te acvf according to (2.19) because s y (ω) does not exist. Now refer back to (2.15) to see its analogy wit (2.19) wic requires tat a condition (2.20) = γ y <, equivalent to (2.11) be satisfied by γ y.tisenablestefollowingteoremanddefinition; refer 4.3 of [20]: 1 Using te trigonometric identity cos(θ 1 θ 2)= cosθ 1 cosθ 2 + sinθ 1 sinθ 2 35

Teorem 2.2. For acvf γ y of a weakly stationary time series {y t} t Z satisfying (2.20), according to Herglotz s teorem, te spectral density function s y (ω) at te frequency ω R exists and is defined as te discrete time Fourier transform of its acvf, i.e., (2.21) s y (ω) = γ y e i2πω. In tis context, it may be noted tat te perturbed pendulum does not satisfy te absolute sum condition because = γy = = cos(2πf) = and it does not ave a spectral density function. In ligt of (2.21) and similar to Property 2.3, te following property of te spectral density function is arrived at: Property 2.5. Te spectral density function as unit periodicity, i.e.,s y (ω) = s y (k + ω) k Z. For an r-variate time series {y t } t Z, werey t =[y 1t y rt ],letγ y i,y j be te (i, j)-t element of its autocovariance matrix Γ y. Referring to 1.1.2 of [102], te condition equivalent to (2.20) for vector random variables becomes (2.22) = γ y i,y j <, wic is valid i, j 1,..., r in defining te matrix of spectral density function S y (ω) C r r wose (i, j)-t element is s y i,y j. Ten, due to te development of Property 2.1 and te relation (2.21), te following is easily got: Property 2.6. Te spectral density function S y (ω) is Hermitian symmetric about ω =0,i.e., (2.23) (S y (ω)) = S y ( ω) =S y (ω). Referring to Teorem 2.7.1 of [18], Teorem 4.4.1 of [39], and [92, 34], anoter important property of te r-variate time series follows: Property 2.7. If {y t } t Z is a linear process, ten S y (ω) 0 ω [0, 1]. Te above discussion is very relevant to te intention in tis tesistoassesstecommonalities of a multivariate time series via its spectral density function. For te purpose of learning multivariate time series based on te commonalities, te ope is to take te following approaces: Firstly, two multivariate time series are compared by evaluating ow similar te components of teir spectral density functions are. Secondly, te future evolution of a multivariate time series is predicted by estimating te acvf,via its spectral density function, tat inerits maximum commonalities. 36

2.5 Asymptotic properties of linear processes In practical problems, it is infeasible to ave a dataset consisting of an infinite collection of samples to compute true statistical properties suc as mean, acvf, variance, etc. Caracteristics of a time series ave to be estimated from a given finite lengt subset of its realization. Wit a limited number of samples, sample estimates may be questioned for teir reliability. Te field of study of asymptotic statisticsstrivestodesign properties, procedures, tests, and estimators in te limit tat te sample size becomes large [122, 58]. A broad review of te asymptotic tecniques will not be resorted to; owever, presented below are te essentials for te tesis s purposes. Consider te scenario in wic, due to computational or limited access to data, time series caracteristics ave to be gleaned from one realization forming a finite lengt data stream. For a weakly stationary time series, tese caracteristics include its mean and acvf for wic, first, te following asymptotic properties referring to 11.2 of [20] are presented: Teorem 2.3. For a finite τ-lengt realization {y t } t = 1,...,τ of a weakly stationary time series {y t } t Z wose acvf Γ y satisfies (2.22), te sample mean (2.24) ŷ = 1 τ τ t=1 y t converges in a mean square sense to te population mean µ y. Teorem 2.4. For a finite τ-lengt realization {y t } t = 1,...,τ of a weakly stationary time series {y t } t Z wit sample mean ŷ, ter r sample acvf τ { 1 (2.25) Γy = τ t=1 (y t+ ŷ)(y t+ ŷ) 0 τ 1, 1 τ τ t= +1 (y t+ ŷ)(y t+ ŷ) τ +1 <0 converges in probability to te population acvf Γ y. Wit te sample acvf Γ y for finite lags, te best ope is for estimates of te spectral density function S y (ω k ) at finite discrete frequencies ω k = k τ k =0,...,τ 1 via inverse discrete Fourier transform. For an oterwise continuous spectral density function S y (ω), 0 ω<1, toseestimatesatdiscretefrequenciesisanapproximation of S y (ω k ) dependent on ow good te sample estimation Γ y is. Terefore, in wat follows, described is te asymptotic property of S y (ω) near any target frequency ω j = j/ĵ j =0,...,ĵ 1, or0 ω j < 1 and ĵ τ. It starts by splitting a period of ω [0, 1) of te spectral density function into ĵ nonoverlapping frequency bands. Suppose tere is a total of τ = nĵ discrete frequencies tat are considered for te splitting so tat eac band will ave n discrete frequencies. By te 0-t frequency band represented by te target frequency ω 0 =0,impliedare n discrete frequencies ω 0,l > 0, l =1,...,n closest to 0. Bytej-t frequency band ω j,l l =1,...,n; j =1,...,ĵ 1, impliedaren frequencies closest to te target frequency ω j = j/ĵ and between ω j b and ω j + b, were2b = n/ĵ is called te 37

ω j,1 ω j,2 ω j,l ω j,n 1 ω j,n ω ω band 0 band 1 band j band (ĵ 2) band (ĵ 1) ω 0 ω 1 ω j ωĵ 2 ωĵ 1 ω ω [0, 1) Figure 2.4: Te sceme of splitting te frequency range ω = [0, 1) into ĵ non-overlapping subbands containing n discrete frequency components eac. bandwidt. Suppose 2b <ω 1 b<ωĵ 1 + b<1 and n ĵ is coosen so tat te bandwidt is very low. For proceeding furter, te following definitions are needed; refer to [45]: Definition 2.18. An r-dimensional complex-valued vector random variable ξ =[ξ 1 ξ r ] = R(ξ) +ii(ξ) C r is defined as te 2r-variate vector random variable η =[R(ξ 1 ) I(ξ 1 ) R(ξ r ) I(ξ r )] R 2r formed by its real and imaginary components. As establised in [45], te covariance matrix Γ ξ of an r-variate complex valued vector random variable ξ is isomorpic, i.e., equivalent upto a row and a column permutation, to te covariance matrix Γ η of its corresponding 2r component vector random variable η via Γ ξ 2Γ η ;(Γ ξ ) 1 1 2 (Γη ) 1 ; wereas te means are isomorpic via µ ξ µ η.also,itwassownteretatdet(γ ξ )= 2 r (det(γ η )) 1 2 and ξ Γ ξ ξ = η Γ η η. Ten, following te convention of a Gaussian distribution of an r-variate random variable u wit mean a and covariance matrix B denoted by (2.26) N (u a, B) = exp ( 1 2 (u a) B 1 (u a) ), (2π) r (det(b)) 1 2 te following definition could be arrived at: 38

Definition 2.19. Te r-variate complex Gaussian probability density ofa complex valued random variable u wit mean a C r and covariance matrix B C r r is defined as N C (u a, B) = exp ( (u a) B 1 (u a) ) (2.27) π r. det(b) Now an essential teorem for tis tesis is presented; refer Teorem 4.4.1 of [18], C.2 of [111], and [53]. Teorem 2.5. Te discrete Fourier transform components of a realization of an r- variate linear process at frequencies ω j,l suc tat lim ω j ω j,l 0 l 1,...,n; ĵ j =1,...,ĵ n are iid samples of an r-dimensional complex-valued vector random variable y j at frequency ω j [0, 1] wit a probability density (2.28) { p y(ωj) NC (u 0, S (u) = y (ω j )), ω j (0, 1) N (u 0, 2 S y (ω j )), ω j {0, 1}. Teorem 2.5 furnises a probabilistic model for discrete Fourier transform samples obtained from a finite realization of a time series of a linear process. Te teorem simply recommends tat discrete Fourier transform components witin a small bandwidt near a target frequency ω j is Gaussian wit te covariance matrix equal to te spectral density function S y (ω j );atzerofrequencytecovariancematrixistwices y (0). In order to use tis teorem, te following procedure is adered to: Given τ samples of a time series realization, first compute te τ-lengt discrete Fourier transform y(ω k ), k =0,...,τ 1. Ten,n discrete Fourier transform components y(ω j,l ), l =1,...,n; j =0, 1,...,ĵ 1, containedintej-t subband may be assigned as (2.29) ω j,l : ω j ω j,l nτ 1 ; n ĵ. For te j-t frequency band, te sample covariance matrix is computed as (2.30) Ŝ y (ω j )= 1 n n (y(ω j,l ) ŷ(ω j ))(y(ω j,l ) ŷ(ω j )), l=1 and ŷ(ω j )= 1 n n y(ω j,l ) is te sample mean of te discrete Fourier transform y(ω k ) and ω j b<ω k <ω j + b. To ensure robustness of te estimate Ŝy (ω j ),typically,onewouldalsowanttomaintain l=1 (2.31) n r 2, refer [106, 12]. It could be sown, as done in 4.2 of [18] or 12.4 of [25] tat for a linear process (2.32) lim n Ey [Ŝy (ω n)] = S y (ω) ω [0, 1). 39

Hence, wile maintaining ĵ n, asufficientlylargen sould provide an unbiased estimate of S y (ω j ) troug (2.30). Tis process is given in Algoritm 1, were care sould be taken to ensure tat tere are sufficient subbands as required by Teorem 2.5. Algoritm 1: PreparediscreteFouriertransformsubbands Input: D = {y t },t =1,...,τ; y t R r ; n;ĵ; Output: {Ŝy (ω j )}; {y(ω j,l )}; j =1,...,ĵ; l =1,...,n; F compute y t y(ω k ); ω k = k τ, k =0,...,τ 1 using (2.16); assign y(ω j,l ); j =1,...,ĵ; l =1,...,n using (2.29); estimate Ŝy (ω j ) using (2.30); 2.6 Summary Tis capter introduced certain frequently sougt after notions pertaining to time and frequency domain analyses of time series. Tese notions include acvf,spectral density function, discrete Fourier transform, wite noise, etc. Terelationbetweente spectral density function and te acvf was recapped on. Also introduced were some of te notations adered to for te remaining capters. As discussed, te spectral density function of a stationary time series is te Fourier transform of its autocovariance function. Te discrete Fourier transform components of a linear process witin a small bandwidt around a target frequency is approximately complex-gaussian wit mean zero and covariance matrix equaling te spectral density function at te target frequency. Our goal for tis tesis is to model and learn a measured multivariate time series by dynamically transforming a low-dimensional latent time series. Te ope is to use classical probabilistic modeling concepts introduced in te next capter to acieve tis goal. Most of tose concepts will be based on fitting popular probability density function models on time and lag independent data; but it is time series data tat is dealt wit. In order to elicit a similar and manageable probability density function tat applies to a wide class of time series, te asymptotic teory of discrete Fourier transform components was approaced. Tis is because tose components witin a small bandwidt may be considered as realizations of a complex-valued Gaussian vector random variable. Tis enables te possibility of applying standard probabilistic modeling tecniques, as reviewed in te next capter, to multivariate timeseriesmodeling. 40