1 Example of Time Series Analysis by SSA 1



Similar documents
Possibilities of Automation of the Caterpillar -SSA Method for Time Series Analysis and Forecast. Th.Alexandrov, N.Golyandina

Singular Spectrum Analysis with Rssa

Introduction to Logistic Regression

DATA ANALYSIS II. Matrix Algorithms

Nonlinear Iterative Partial Least Squares Method

Introduction to time series analysis

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Chapter 4: Vector Autoregressive Models

Linear Algebra Review. Vectors

Time series Forecasting using Holt-Winters Exponential Smoothing

Manifold Learning Examples PCA, LLE and ISOMAP

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Subspace Analysis and Optimization for AAM Based Face Alignment

Data Mining: Algorithms and Applications Matrix Math Review

Component Ordering in Independent Component Analysis Based on Data Power

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Similarity and Diagonalization. Similar Matrices

3.1 State Space Models

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

by the matrix A results in a vector which is a reflection of the given

Oscillations of the Sending Window in Compound TCP

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Chapter 12 Modal Decomposition of State-Space Models 12.1 Introduction The solutions obtained in previous chapters, whether in time domain or transfor

Multivariate Normal Distribution

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

Time Series Analysis

1.4 Fast Fourier Transform (FFT) Algorithm

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

What Can We Learn by Disaggregating the Unemployment-Vacancy Relationship?

Lecture 5: Singular Value Decomposition SVD (1)

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Multivariate Analysis of Ecological Data

Linear Algebra and TI 89

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

7 Time series analysis

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Social Media Mining. Network Measures

Chapter 7. Lyapunov Exponents. 7.1 Maps

The Power (Law) of Indian Markets: Analysing NSE and BSE Trading Statistics

Trend and Seasonal Components

Scicos is a Scilab toolbox included in the Scilab package. The Scicos editor can be opened by the scicos command

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

MBA Jump Start Program

9 Hedging the Risk of an Energy Futures Portfolio UNCORRECTED PROOFS. Carol Alexander 9.1 MAPPING PORTFOLIOS TO CONSTANT MATURITY FUTURES 12 T 1)

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

International comparisons of road safety using Singular Value Decomposition

y t by left multiplication with 1 (L) as y t = 1 (L) t =ª(L) t 2.5 Variance decomposition and innovation accounting Consider the VAR(p) model where

2013 MBA Jump Start Program. Statistics Module Part 3

ITSM-R Reference Manual

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Jobs Online Background and Methodology

Department of Economics

Managing large sound databases using Mpeg7

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

Principal components analysis

Part 2: Community Detection

The accurate calibration of all detectors is crucial for the subsequent data

Mean value theorem, Taylors Theorem, Maxima and Minima.

Solutions to Exam in Speech Signal Processing EN2300

2. Simple Linear Regression

MIMO CHANNEL CAPACITY

Motor and Household Insurance: Pricing to Maximise Profit in a Competitive Market

Continued Fractions and the Euclidean Algorithm

Applications to Data Smoothing and Image Processing I

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Chapter 10 Introduction to Time Series Analysis

Multivariate Analysis of Variance (MANOVA)

Dimensionality Reduction: Principal Components Analysis

Regression III: Advanced Methods

3. Regression & Exponential Smoothing

Orthogonal Diagonalization of Symmetric Matrices

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

Electrical Resonance

Charting Income Inequality

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

NETZCOPE - a tool to analyze and display complex R&D collaboration networks


Lecture 14. Point Spread Function (PSF)

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Lecture 4: Seasonal Time Series, Trend Analysis & Component Model Bus 41910, Time Series Analysis, Mr. R. Tsay

Least Squares Estimation

South Carolina College- and Career-Ready (SCCCR) Algebra 1

Transcription:

1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales in Australia from January 1984 till June 1994). We examine the whole time series (of length N=174) as well as its subseries formed from the rst 120 points. We use designations FORT174 for the former and FORT120 for the latter. Presented gures illustrate the theoretical concepts and should help to get a better understanding of the method approach. Visual analysis of the time series depicted in Fig. 1 indicates that this series has a trend and this trend can be approximated either by linear function or by exponentially decreasing function. Also it seems that the seasonal component has complicated and changing behavior. The periodogram the centered time series (Fig. 2) conrms this assumption. This periodogram is constructed only on the rst 168 points for rening the representation, as 168 is divisible by 12, the number of months per year. The aim of the 'Caterpillar'-SSA analysis is decomposition of this time series into three components: a trend, a seasonal component and a noise. The choice of window length. At rst, let us guess the possible dimension of the signal (composed of a trend and a periodical component). The trend of the FORT time series is likely described by eigentriples 12; the seasonal component consists of harmonics with frequencies 1/12 (annual periodicity), 1/6=2/12 (half-year), 1/4=3/12, 1/3=4/12, 1/2.4=5/12, 1/2=6/12. Dimension of each harmonic with frequency less than 1/2 is equal to 2, whereas the harmonic with frequency 1/2 has the dimension 1 (we assume that changes of the harmonics amplitudes have exponential character, perhaps with dierent rates for dierent harmonics). Thus, the expected dimension of the time series should not exceed 13. If the time series didn't include a noise and its components were strongly separable from each other, then it would be sucient to use window length 13 for time series decomposition into a trend and a seasonal component. However a exact separability doesn't practically exist for real-life data and so we need to use theoretical results about approximate (asymptotic) separability of a slowly varying trend and a harmonic. To obtain better accuracy we need to choose window length L close to a half of the time series length (since a convergence rate of separability error to zero has the order 1/ min(l, K), where K = N L + 1, N denotes the length of our time series). We know that for getting better separability of periodical components we should select L and K divisible by period. Moreover it is more important that the smaller of L and K number is divisible by 12, so we choose window length L = 84 (N = 174, hence K = 91). The method of eigentriples identication. Let us consider the result of Singular Value Decomposition of the trajectory matrix performed with the chosen window length. Fig. 3 represents eigenvectors from the rst six eigentriples (recall that eigentriple=(square root of eigenvalue, eigenvector, factor vector)= (singular value, left singular vector, right singular vector)). It should be remarked that the form of factor vectors (so-called right singular vectors) is almost the same as the form of left ones (eigenvectors) because L is close to K. If we took smaller window length, then the eigenvectors would have more regular form in comparison with factor vectors, since the latter ones would reect the varying of harmonic components amplitudes. Let us use the conclusions about singular vectors form for identication of eigentriples corresponding to a trend and harmonics under the assumption of their approximate separability. Trend identication. Let us start with identication of a trend. We know that singular vectors have (in general) the same form as the corresponding components of the initial time series. Thus we should nd slowly varying eigenvectors. It can be done by consideration of one-dimensional plots of eigenvectors. In this instance only the leading eigenvector has the 1 This is the example section of [2]. See also http://www.gistatgroup.com/cat/ 1

required form, i.e., trend is described by one eigentriple. This directly implies that our trend is approximated by an exponential function. Note that the more complicated form a trend has, the larger (approximate) dimension it has and larger amount of eigentriples corresponds to it. Harmonics identication. Now we will try to identify harmonic components (possibly with varying amplitudes) produced by the seasonal component of the original time series. As shown in Fig. 3, eigentriples 26 correspond to some harmonics, since their singular vectors have the regular periodical form. We know that every harmonic with frequency smaller than 0.5 produces two eigentriples. (A harmonic with frequency 0.5 generates one eigentriple with saw-tooth singular vectors as they are also harmonics with period 2; here we do not see such vectors.) One way to nd pairs of components corresponding to harmonics is consideration of two-dimensional plots of singular vectors. It is enough to examine only plots which are built for adjacent eigentriples (ordered by eigenvalues) because it is known that eigenvalues from the corresponding to harmonic pair are close for a suciently long time series. In Fig. 4 you can recognize regular two-dimensional graphs which form two-dimensional trajectories with vertices in a spiral-shaped curve. It indicates that these pairs of singular vectors are produced by modulated harmonic components of the initial time series. In that way, eigentriples (ET in abbreviated form) ET2,3 correspond to period 12, ET4,5 to 4, ET6,7 to 6, ET8,9 to 2.4, ET10,11 to 3. We use a notion 'fractional period' for a harmonic with the frequency inverse to this period. Supplementary characteristics. Let us describe additional information, which can help us to identify eigentriples and to conrm components grouping. Fig. 5 (which depicts logarithms of eigenvalues) provides such information in this way: a pair of eigentriples corresponding to a harmonic produces a plateau in this graph. Analysis of the matrix of w-correlations between reconstructed components of initial time series is also useful for identication. Fig. 6 depicts the six leading elementary reconstructed components of the original time series, where each component is formally reconstructed by one eigentriple. Recall that w-correlation is a weighted correlation between reconstructed time series. The condition of its equality to zero is necessary for separability of the corresponding time series components. Fig. 7 conrms the performed identication: the w-correlation for components from a pair which corresponds to a harmonic is rather high whereas w-correlations between pairs and a trend are close to zero (it's reected by white color of corresponding elements of the correlation matrix). Separation of a signal from noise. Let us dwell upon a question of separation of components corresponding to a signal from noise components. Firstly, irregular behavior of singular vectors can indicate their belonging to a set induced by the noise component. This irregularity should be distinguished from components mixture, which is caused by lack of strong separability of these components. Boundary between signal and noise components can be conrmed by slow (almost without jumps) decreasing of eigenvalues starting from some number. Secondly, the large set of eigentriples generated by reconstructed components which are correlated between each other is quite likely to belong to a noise. Fig. 7 contains such block of eigentriples with numbers 14-84. The question about the pair ET12,13 is still open. On the one hand, this pair is well separated from the residual. On the other hand, period of the component reconstructed by ET12,13 is close to 2.33, as the periodogram indicates. Such period cannot be interpreted in the context of seasonality. It can be caused either by a noise or by a high-modulated harmonic with period 2.4. It isn't possible to come to a reliable conclusion on this specic time series component, so we will classify ET12,13 as a noise. We apply standard statistical methods to conrm the accuracy of performed separation of a signal from a noise. Fig.8 depicts decomposition of initial time series into three components: the trend (ET1, on the background of the initial time series), the periodic (seasonal) component 2

(ET2-11), and the noise (ET12-84). The statistical criteria conrm (don't reject) the hypothesis that the third component is a realization of white noise (three dierent criteria of independence produce p-levels larger than 0.4). Decomposition of the initial time series into components. Fig. 8 demonstrates the result of grouping of SVD components followed by diagonal averaging. This is a solution of the stated problem of the initial time series decomposition into 'independent' and 'identiable' additive components. Table 1 contains values of w-correlations between represented in Fig. 8 components. Òàáëèöà 1: w-correlations for the time series decomposition into the trend, the periodical component, and the noise ET1 ET211 ET1284 ET1 1 0 0 ET211 0 1 0.016 ET1284 0 0.016 1 More detailed investigation can be interesting in addition to general time series decomposition. The matter of interest for the considered time series is the seasonal component behavior. Fig. 9 depicts the leading three components of seasonal component decomposition into the sum of harmonics. Several facts are worthy of notice: decreasing amplitude of annual harmonic, more or less invariable behavior of half-year harmonic and increasing amplitude of 4-month harmonic. Note that standard statistical methods generally assume either stationarity of amplitudes (additive models) or amplitude variations similar for all harmonics and proportional to the trend (multiplicative models). It is clear that for 'FORT' time series both models are not appropriate. The problem of strong separability. The lack of strong separability is the problem of components mixing. It is caused by close to each other eigenvalues (weights) corresponding to dierent components. For demonstration of this eect, let us consider subseries consisting of the rst 120 points of the initial time series. We choose the window length L equal to 60. Fig. 10 represents the matrix of w-correlations. Just as for matrix depicted in Fig. 7 we see here that eigentriples starting from number 12 can be related to a noise. However dark-colored block formed by ET8-11 reects probable mixing of two harmonics. Periodogram analysis of eigenvectors conrms this supposition; there is a mixing of harmonics with frequencies 1/3 and 1/2.4=5/12. It is not so essential for extraction of the entire seasonal component; this eect accounts for problems only during the identication. If we wanted to extract, for instance, a quarterly (three-month) harmonic component, then the lack of strong separability would interfere with performing such extraction. The comparison of periodograms of the (almost) entire initial time series (Fig. 2) and of its leading 120 points (Fig. 11) explains why this problem didn't arise for the initial time series: for FORT174 the contributions of frequencies 1/3 and 1/2.4 are slightly dierent whereas they are practically equal for the subseries FORT120. Ñïèñîê ëèòåðàòóðû [1] Golyandina N., Nekrutkin V., and Zhigljavsky A. Analysis of Time Series Structure: SSA and Related Techniques, London: Chapman & Hall/CRC, 2001, 305 P. [2] Golyandina N. The 'Caterpillar'-SSA Method for Time Series Analysis: Training Aids, St.Petersburg, St.Petersburg State University, 2004, 87 P. (In Russian). 3

5618 5067 4516 3964 3413 2862 2311 1759 1208 Jan80 Jul81 Jan83 Jul84 Jan86 Jul87 Jan89 Jul90 Jan92 Jul93 Ðèñ. 1: FORT174: initial time series 262934 230067 197201 164334 131467 98600 65734 32867 0 0 0.083 0.167 0.250 0.333 0.417 0.500 Ðèñ. 2: FORT174: periodogram of centered time series (the rst 168 points) 0.124 1(94.648%) 0.19 2(1.428%) 0.109 0.093-0.19 0.19 3(1.364%) 0.13 4(0.504%) -0.19-0.13 0.13 5(0.495%) 0.18 6(0.262%) -0.13-0.18 Ðèñ. 3: FORT174: one-dimensional plots of eigenvectors 4

2(1.428%) - 3(1.364%) 3(1.364%) - 4(0.504%) 4(0.504%) - 5(0.495%) 5(0.495%) - 6(0.262%) 6(0.262%) - 7(0.253%) 7(0.253%) - 8(0.147%) 8(0.147%) - 9(0.144%) 9(0.144%) - 10(0.092%) 10(0.092%) - 11(0.089%) Ðèñ. 4: FORT174: two-dimensional plots of eigenvectors 20.5 19.3 18.2 17.1 15.9 14.8 13.7 12.5 11.4 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Ðèñ. 5: FORT174: logarithms of the leading 30 eigenvalues 3946 1(94.648%) 656 2(1.428%) 3082-251 2218-1158 514 3(1.364%) 195 4(0.504%) -2-38 -518-272 175 5(0.495%) 381 6(0.262%) 0 83-175 -216 Ðèñ. 6: FORT174: elementary reconstructed components 5

(29) (25) (21) (17) (13) (9) (5) (1) (1) (5) (9) (13) (17) (21) (25) (29) - [, 0.05] - (0.05, 0.10] - (0.10, 0.14] - (0.14, 0.19] - (0.19, 0.24] - (0.24, 0.29] - (0.29, 0.33] - (0.33, 0.38] - (0.38, 0.43] - (0.43, 0.48] - (0.48, 0.52] - (0.52, 0.57] - (0.57, 0.62] - (0.62, 0.67] - (0.67, 0.71] - (0.71, 0.76] - (0.76, 0.81] - (0.81, 0.86] - (0.86, 0.90] - (0.90, 0.95] - (0.95, 1.00] Ðèñ. 7: FORT174: matrix of w-correlations between elementary reconstructed components 5618 1(94.648%) 1154 1648 2-11(4.778%) -1387 678 12-84(0.574%) -1060 Ðèñ. 8: FORT174: decomposition of the time series into the trend (on the background of the initial time series, see the top graph), the seasonal component (in the middle) and the noise (the bottom graph) 1101 2,3(2.792%) -1143 351 4,5(0.999%) -349 376 6,7(0.515%) -375 Ðèñ. 9: FORT174: decomposition of the seasonal component into the sum of harmonics 6

(29) (25) (21) (17) (13) (9) (5) (1) (1) (5) (9) (13) (17) (21) (25) (29) - [, 0.05] - (0.05, 0.10] - (0.10, 0.14] - (0.14, 0.19] - (0.19, 0.24] - (0.24, 0.29] - (0.29, 0.33] - (0.33, 0.38] - (0.38, 0.43] - (0.43, 0.48] - (0.48, 0.52] - (0.52, 0.57] - (0.57, 0.62] - (0.62, 0.67] - (0.67, 0.71] - (0.71, 0.76] - (0.76, 0.81] - (0.81, 0.86] - (0.86, 0.90] - (0.90, 0.95] - (0.95, 1.00] Ðèñ. 10: FORT120: matrix of w-correlations between elementary reconstructed components 328072 287063 246054 205045 164036 123027 82018 41009 0 0 0.083 0.167 0.250 0.333 0.417 0.500 Ðèñ. 11: FORT120: periodogram of the centered time series 7