TIME SERIES ANALYSIS: AN OVERVIEW



Similar documents
Univariate and Multivariate Methods PEARSON. Addison Wesley

Chapter 10 Introduction to Time Series Analysis

Introduction to Time Series Analysis. Lecture 6.

Time Series Analysis

Univariate Time Series Analysis; ARIMA Models

1 Short Introduction to Time Series

Analysis and Computation for Finance Time Series - An Introduction

Lecture 2: ARMA(p,q) models (part 3)

Sales forecasting # 2

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Univariate Time Series Analysis; ARIMA Models

Time Series Analysis

Time Series Analysis

State Space Time Series Analysis

Some useful concepts in univariate time series analysis

Traffic Safety Facts. Research Note. Time Series Analysis and Forecast of Crash Fatalities during Six Holiday Periods Cejun Liu* and Chou-Lin Chen

Probability and Random Variables. Generation of random variables (r.v.)

ITSM-R Reference Manual

Time Series Analysis in Economics. Klaus Neusser

Software Review: ITSM 2000 Professional Version 6.0.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Time Series Analysis 1. Lecture 8: Time Series Analysis. Time Series Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 MIT 18.S096

Discrete Time Series Analysis with ARMA Models

Chapter 5. Analysis of Multiple Time Series. 5.1 Vector Autoregressions

TIME SERIES ANALYSIS

A FULLY INTEGRATED ENVIRONMENT FOR TIME-DEPENDENT DATA ANALYSIS

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Trend and Seasonal Components

Sections 2.11 and 5.8

Chapter 4: Vector Autoregressive Models

Time Series Analysis in WinIDAMS

11. Time series and dynamic linear models

Granger Causality in fmri connectivity analysis

Time Series Analysis

Introduction to Time Series Analysis. Lecture 1.

Autocovariance and Autocorrelation

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Chapter 9: Univariate Time Series Analysis

Introduction to Path Analysis

Time Series Analysis

Financial TIme Series Analysis: Part II

TIME SERIES ANALYSIS

Time Series - ARIMA Models. Instructor: G. William Schwert

7 Time series analysis

Advanced Signal Processing and Digital Noise Reduction

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)

NAG C Library Chapter Introduction. g13 Time Series Analysis

The Engle-Granger representation theorem

AR(p) + MA(q) = ARMA(p, q)

Studying Achievement

Statistics Graduate Courses

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Data Mining: Algorithms and Applications Matrix Math Review

Masters research projects. 1. Adapting Granger causality for use on EEG data.

Time Series Analysis

Time Series Analysis of Aviation Data

System Identification for Acoustic Comms.:

Estimating an ARMA Process

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Topic 5: Stochastic Growth and Real Business Cycles

Impulse Response Functions

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

THE SVM APPROACH FOR BOX JENKINS MODELS

Chapter 8 - Power Density Spectrum

Forecasting model of electricity demand in the Nordic countries. Tone Pedersen

Time Series Analysis: Introduction to Signal Processing Concepts. Liam Kilmartin Discipline of Electrical & Electronic Engineering, NUI, Galway

Factor analysis. Angela Montanari

Cognitive Neuroscience. Questions. Multiple Methods. Electrophysiology. Multiple Methods. Approaches to Thinking about the Mind

Matrices and Polynomials

How To Model A Series With Sas

3. Regression & Exponential Smoothing

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Lecture 4: Seasonal Time Series, Trend Analysis & Component Model Bus 41910, Time Series Analysis, Mr. R. Tsay

Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents

Chapter 5: Bivariate Cointegration Analysis

Time Series Analysis

Analysis of algorithms of time series analysis for forecasting sales

Time Series Analysis III

Forecasting methods applied to engineering management

ARMA, GARCH and Related Option Pricing Method

Lecture 3: Linear methods for classification

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Relations Between Time Domain and Frequency Domain Prediction Error Methods - Tomas McKelvey

Summary Nonstationary Time Series Multitude of Representations Possibilities from Applied Computational Harmonic Analysis Tests of Stationarity

Multivariate Normal Distribution

Time Series HILARY TERM 2010 PROF. GESINE REINERT

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Single trial analysis for linking electrophysiology and hemodynamic response. Christian-G. Bénar INSERM U751, Marseille

Springer Texts in Statistics

APPLICATION OF THE VARMA MODEL FOR SALES FORECAST: CASE OF URMIA GRAY CEMENT FACTORY

Vector Time Series Model Representations and Analysis with XploRe

Performing Unit Root Tests in EViews. Unit Root Testing

MATHEMATICAL METHODS OF STATISTICS

ANALYZING INVESTMENT RETURN OF ASSET PORTFOLIOS WITH MULTIVARIATE ORNSTEIN-UHLENBECK PROCESSES

Functional Data Analysis of MALDI TOF Protein Spectra

Linear Threshold Units

Business cycles and natural gas prices

Time Series Analysis

Transcription:

Outline TIME SERIES ANALYSIS: AN OVERVIEW Hernando Ombao University of California at Irvine November 26, 2012

Outline OUTLINE OF TALK 1 TIME SERIES DATA 2 OVERVIEW OF TIME DOMAIN ANALYSIS 3 OVERVIEW SPECTRAL ANALYSIS

Outline OUTLINE OF TALK 1 TIME SERIES DATA 2 OVERVIEW OF TIME DOMAIN ANALYSIS 3 OVERVIEW SPECTRAL ANALYSIS

Outline OUTLINE OF TALK 1 TIME SERIES DATA 2 OVERVIEW OF TIME DOMAIN ANALYSIS 3 OVERVIEW SPECTRAL ANALYSIS

TIME SERIES DATA Visual-motor electroencephalogram (HAND EEG) Global temperature series Seismic recordings LA county environmental data (mortality, pollution, temperature)

NEUROSCIENCE DATA AND STATISTICAL GOALS Electrophysiologic data: multi-channel EEG, local field potentials Hemodynamic data: fmri time series at several ROIs

NEUROSCIENCE DATA AND STATISTICAL GOALS Multi-channel (multivariate) Two movement conditions: leftward vs. rightward

NEUROSCIENCE DATA AND STATISTICAL GOALS External Stimulus Visual, Auditory, Somatosensory, Stress Personality traits, Genes, Socio-Environmental Factors Unobserved: brain network/cell assemblies Brain Signals (indirect measures of neuronal activity) Functional: fmri, EEG, MEG, PET Anatomical: DTI Acute Outcomes Emotion, Skin conductance, Motor response

NEUROSCIENCE DATA AND STATISTICAL GOALS Stimulus Neuronal Response Brain Signals Behavior

NEUROSCIENCE DATA AND STATISTICAL GOALS Stimulus Neuronal Response Brain Signals Behavior Moderators Modifiers Genes Trait Socio-Environment

NEUROSCIENCE DATA AND STATISTICAL GOALS Changes in the mean Changes in variance Changes in Cross-Dependence

NEUROSCIENCE DATA AND STATISTICAL GOALS Characterize dependence in a brain network Temporal: Y 1 (t) [Y 1 (t 1), Y 2 (t 1),...] Spectral: interactions between oscillatory activities at Y 1, Y 2 Develop estimation and inference methods for connectivity Investigate potential for connectivity as a biomarker Predicting behavior Motor intent (left vs. right movement) [Brain-Computer-Interface] State of learning Level of mental fatigue Differentiating patient groups (bipolar vs. healthy children) Connectivity between left DLPFC right STG is greater for bipolar than healthy

NEUROSCIENCE DATA AND STATISTICAL GOALS New dependence measures must be easily interpretable Models must incorporate information across trials, across subjects Models must account differences in brain network between conditions Take advantage of multi-modal data (EEG, fmri, DTI) Model should be informed by physiology and physics Dimension reduction: extract information from massive data that is most relevant for estimating dependence Develop formal statistical inference procedures

NEUROSCIENCE DATA AND STATISTICAL GOALS Selected References Automatic methods: SLEX Transform (Smooth Localized Complex EXponnetials) Ombao et al. (2001, JASA) Ombao et al. (2001, Biometrika) Ombao et al. (2002, Ann Inst Stat Math) Huang, Ombao and Stoffer (2004, JASA) Ombao et al. (2005, JASA) Böhm, Ombao et al. (2010, JSPI) Massive data; Complex-dependence; Mixed Effects Bunea, Ombao and Auguste (2006, IEEE Trans Sig Proc) Ombao and Van Bellegem (2008, IEEE Trans Sig Proc) Freyermuth, Ombao, von Sachs (2009, JASA) Fiecas and Ombao (2011, Annals of Applied Statistics) Gorrostieta, Ombao et al. (2012, NeuroImage) Kang, Ombao et al. (2012, JASA)

GLOBAL TEMPERATURE SERIES Global Temperature Deviation 0.4 0.2 0.0 0.2 0.4 0.6 1880 1900 1920 1940 1960 1980 2000 Time

GLOBAL TEMPERATURE SERIES Model: Y(t) = µ(t)+ǫ(t) Deterministic part: µ(t) = β 0 +β 1 t Stochastic part: ǫ(t) colored noise ǫ t E ǫ t = 0,Cov(ǫ r,ǫ s ) = γ(r, s) Inference on the trend β 1 Is this simply a local" trend? What is the impact of man s activities on temperature fluctuations? What is the impact of temperature increases on natural calamities? Statistical challenge: develop a model that captures the (a.) complexities in the spatio-temporal covariance structure and (b.) causual relationships

SEISMIC RECORDINGS

SEISMIC RECORDINGS Two classes: Π 1 (earthquake) and Π 2 (explosion) Discrimination What features" separate Π 1 and Π 2? Classification Given a new time series x, classify into Π 1 or Π 2 D(x, f 1 ) vs D(x, f 2 ) Background: ban nuclear testing (classify Novaya Zelmya event of unknown origin) Seminal work: Shumway (1982, 1998, 2003) Statistical challenges: feature extraction, feature selection from massive data

LA COUNTY MORTALITY AND ENVIRONMENTAL DATA Cardiovascular Mortality 70 100 130 1970 1972 1974 1976 1978 1980 Temperature 50 70 90 1970 1972 1974 1976 1978 1980 Particulates 20 60 100 1970 1972 1974 1976 1978 1980

LA COUNTY MORTALITY AND ENVIRONMENTAL DATA Shumway, Azari and Pawitan (1988); Shumway and Stoffer (2010) LA County Weekly data on mortality, temperature and pollution levels Model mortality (temperature + pollution) Granger causality: does past knowledge of temperature and pollution help improve prediction for mortality? Causation vs Association Practical issues Hospitalization (rather than mortality) Effect of pollution might be long term (rather than short term)

COVARIANCE AND CORRELATION Bivariate time series Y(t) = [Y 1 (t), Y 2 (t)] Auto-covariance γ ll (s, t) = Cov[Y l (s), Y l (t)], l = 1, 2 Variance γ ll (t, t) = Cov[Y l (t), Y l (t)], l = 1, 2 Auto-correlation ρ ll (s, t) = γ ll (s,t) γll (s,s)γ ll (t,t)

COVARIANCE AND CORRELATION Cross-covariance γ pq (s, t) = Cov[Y p (s), Y q (t)] Cross-correlation ρ pq (s, t) = γ pq(s,t) γpp(s,s)γ qq(t,t)

COVARIANCE AND CORRELATION Trivariate time series: X, Y, Z Cross-correlation ρ(x, Y) = Cov(X,Y) Var XVar Y Partial cross-correlation between X and Y given Z Remove Z from X: ǫ X = X β X Z Remove Z from Y: ǫ Y = Y β Y Z ρ(x, Y Z) = Cov(ǫ X,ǫ Y ) VarǫX Varǫ Y

COVARIANCE AND CORRELATION Model A Model B Cross-Corr Yes Yes Partial CC NO Yes

WEAK STATIONARITY Bivariate time series Y(t) = [Y 1 (t), Y 2 (t)] Under weak stationarity, the following quantities do not change with time t: EY(t) = [µ 1,µ 2 ], Cov[Y p (s), Y q (t)] = λ pq ( s t ) Var Y p (t) = λ pp (0)

TIME DOMAIN MODELS Univariate time series Y(t) Moving Average (MA) Auto-Regressive (AR) Moving Average Auto-Regressive (ARMA) Multivariate time series Y(t) Vector Moving Average (VMA) Vector Auto-Regressive (VAR) Vector ARMA (VARMA) VARMA with exogenous series (VARMAX)

WHITE NOISE {W(t)} is a white-noise time series if EW(t) = 0 for all t {W(t)} is uncorrelated Cov[W(t), W(s)] = { σ 2 W, s = t 0, s t

MOVING AVERAGE (MA) TIME SERIES Let {W(t)} be a white noise time series Y(t) is a first order MA [MA(1)] if it can be expressed as MA(Q) time series Y(t) = Y(t) = W(t)+θ 1 W(t 1) Q θ q W(t q) where θ 0 = 1. q=1 Linear System where input is white noise; output is the moving averaqe time series MA gives a one-sided linear combination of present and past white noise Consider the case θ 0 =... = θ Q = 1 then Y(t) is a summed" or smoothed" version of the white noise

AUTO-REGRESSIVE TIME SERIES (AR) Backshift Operator B BY(t) = Y(t 1) Let k be a positive integer. Then B k Y(t) = Y(t k) Let Φ(B) = 1 φ 1 B... φ P B P. Then Φ(B)Y(t) = Y(t) φ 1 Y(t 1)... φ P Y(t P) Let Φ(B) = 1 φ 1 B. Then Φ(B)Y(t) = 1 φ 1 Y(t 1) Φ 1 (B) = l=0φ l 1 Bl when φ 1 < 1

AUTO-REGRESSIVE TIME SERIES (AR) Let {W(t)} be a white noise series Let φ 1 ( 1, 1) {Y(t)} is a stationary first-order auto-regressive model AR(1) series if Y(t) = φ 1 Y(t 1)+W(t) One can also express Y(t) as having an infinite-order MA process [MA( )] W(t) = Y(t) φ 1 Y(t 1) W(t) = [1 φ 1 B]Y(t) Y(t) = [ φ l 1 Bl ]W(t) l=0 Y(t) = φ l 1W(t l)

AUTO-REGRESSIVE TIME SERIES (AR) {Y(t)} is an AR(P) time series if it can be expressed as Y(t) = P φ p Y(t p)+w(t) p=1 Φ(B) = 1 φ 1 B... φ P B P is called the AR polynomial equation Y(t) is stb causal if the roots of Φ(z) lie outside of the unit circle Example: AR(1) Φ(B) = 1 φ 1 B The solution to Φ(z) = 0 is z = 1 φ 1 where z > 1

AUTO-REGRESSIVE TIME SERIES (AR) On ESTIMATION AND INFERENCE Y(t) = φ 1 Y(t 1)+W(t), φ 1 ( 1, 1) Impose W(t) N(0,σW 2 ) Y(t) N(0, σ 2 W 1 φ 2 1 ) Y(t) Y(t 1),...Y(1) N(φ 1 Y(t 1),σW 2 ) Data: {Y(1),...,Y(T)}

AUTO-REGRESSIVE TIME SERIES (AR) Conditional likelihood L C (φ 1,σW 2 ) = f(y(2) y(1))... f(y(t) y(1)... y(t 1)) T 1 = 1 2πσW 2 ( ) exp 1 T 2σW 2 (y(t) φ 1 y(t 1)) 2 t=2

AUTO-REGRESSIVE TIME SERIES (AR) Full likelihood L(φ 1,σ 2 W ) = f(y(1)) L C(φ 1,σ 2 W ) = 1 φ 2 1 ( ) T exp( 1 2σ 2 2πσW 2 W [ (1 φ 2 1 )y(1)2 + T (y(t) φ 1 y(t 1)) ]). 2 t=2 Conditional likelihood asymptotically equivalent to the full likelihood

THE SPECTRUM X(t) STATIONARY TEMPORAL PROCESS Cramér Representation X(t) = exp(i2πωt)dz(ω), t = 0,±1,±2,... Basis Fourier waveforms exp(i2πωt), ω ( 0.5, 0.5) Random coefficients dz(ω) increment random process EdZ(ω) = 0 and Cov[dZ(ω), dz(λ)] = δ(ω λ)f(ω)dωdλ Var dz(ω) = f(ω)dω Spectrum f(ω) DECOMPOSITION OF VARIANCE of X(t): Var X(t) = f(ω)dω

THE SPECTRUM Stochastic Regression Model Wave (2 oscillations) Distribution of Random Coeff 4 0 4 Wave (10 oscillations) Distribution of Random Coeff 4 0 4

THE SPECTRUM SPECTRUM decomposition of variance X = [X(1),..., X(T)] - zero mean stationary time series Φ - columns are the orthonormal Fourier waveforms d = [d(ω 0 ),..., d(ω T 1 )] - Fourier coefficients X = Φd X X = d d - Parseval s identity 1 T EX X = 1 T Ed d Var X(t) f(ω)dω

THE SPECTRUM A more formal derivation... X(t) = exp(i2πωt)dz(ω) γ(h) = Cov[X(t + h), X(t)] f(ω) = h= γ(h) exp( i2πωh) γ(h) = 0.5 0.5 f(ω) exp( i2πωh)dω γ(0) = f(ω)dω

SPECTRUM = VARIANCE DECOMPOSITION AR(1): X t = 0.9X t 1 +ǫ t Low Frequency Oscillations 8 6 4 2 0 2 4 6 8 10 0 100 200 300 400 500 600 700 800 900 1000 Time

SPECTRUM = VARIANCE DECOMPOSITION Spectrum of AR(1) with φ = 0.9 0.5 Frequency 0 0 1 Time

SPECTRUM = VARIANCE DECOMPOSITION AR(1): X t = 0.9X t 1 +ǫ t High Frequency Oscillations 6 4 2 0 2 4 6 8 0 100 200 300 400 500 600 700 800 900 1000 Time

SPECTRUM = VARIANCE DECOMPOSITION Spectrum of AR(1) with φ = 0.9 0.5 Frequency 0 0 1 Time

SPECTRUM = VARIANCE DECOMPOSITION Mixture: Low + High Frequency Signal 20 15 10 5 0 5 10 15 0 100 200 300 400 500 600 700 800 900 1000 Time

SPECTRUM = VARIANCE DECOMPOSITION Spectrum of the mixed signal 0.5 Frequency 0 0 1 Time

CROSS-COHERENCE - A MEASURE OF DEPENDENCE An Illustration: Interactions between oscillatory components Latent Signals U 1 (t) - low frequency signal U 2 (t) - high frequency signal Observed Signals X(t) = U 1 (t) + U 2 (t) + Z 1 (t) Y(t) = U 1 (t +l) + Z 2 (t) X and Y are linearly related through U 1.

CROSS-COHERENCE - A MEASURE OF DEPENDENCE X1(t) X2(t) Delta Theta Alpha Beta Gamma

CROSS-COHERENCE - A MEASURE OF DEPENDENCE Time series at 3 channels: X, Y, Z Cross-correlation ρ(x, Y) = Cov(X,Y) Var XVar Y Partial cross-correlation between X and Y given Z Remove Z from X: ǫ X = X β X Z Remove Z from Y: ǫ Y = Y β Y Z ρ(x, Y Z) = Cov(ǫ X,ǫ Y ) VarǫX Varǫ Y

CROSS-COHERENCE - A MEASURE OF DEPENDENCE Model A Model B Cross-Corr Yes Yes Partial CC NO Yes

CROSS-COHERENCE - A MEASURE OF DEPENDENCE When ρ(x, Y Z) 0, we want to identify the frequency bands that drive the direct linear association. When ρ(x, Y) 0, we want to identify the frequency bands that drive the linear association. Notation U(t) = X(t) Y(t) Z(t) dz(ω) = dz X (ω) dz Y (ω) dz Z (ω) Spectral representation of a stationary process U(t) = 0.5 0.5 exp(i2πωt)dz(ω). Spectral matrix Cov dz(ω) = f(ω)dω Formal definition of coherence ρ X,Y (ω) = Corr(dZ X (ω), dz Y (ω)) 2

CROSS-COHERENCE: AN INTUITIVE INTERPRETATION Ombao and Van Bellegem (2008, IEEE Trans Signal Processing) Filtered Signals X ω (t) = F ω X(t) Y ω (t) = F ω Y(t) Z ω (t) = F ω Z(t) Coherence at frequency band around ω Partial coherence ρ X,Y (ω) Corr(X ω (t), Y ω (t)) 2 Remove Z ω (t) from X ω (t): ξω(t) X = X ω (t) β X Z ω (t) Remove Z ω (t) from Y ω (t): ξω Y(t) = Y ω(t) β Y Z ω (t) ρ X,Y Z (ω) = Cov(ξ X ω (t),ξy ω (t)) 2 Varξ X ω (t)varξω Y(t) Relevant work: Pupin (1898) Estimator for f X (ω) is Var X ω (t).