Integrated Wavelet Denoising Method for High-Frequency Financial Data Forecasting

Transcription

1 Integrated Wavelet Denoising Method for High-Frequency Financial Data Forecasting Edward W. Sun KEDGE Business School, France Yi-Ting Chen School of Computer Science National Chiao Tung University, Taiwan Min-Teh Yu National Chiao Tung University, Taiwan Abstract Intelligent pattern recognition imposes new challenges in high-frequency financial data mining due to its irregularities and roughness. Based on the wavelet transform for decomposing systematic patterns and noise, in this paper we propose a new integrated wavelet denoising method, named smoothness-oriented wavelet denoising algorithm (SOWDA), that optimally determines the wavelet function, maximal level of decomposition, and the threshold rule by using a smoothness score function that simultaneously detects the global and local extrema. We discuss the properties of our method and propose a new evaluation procedure to show its robustness. In addition, we apply this method both in simulation and empirical investigation. Both the simulation results based on three typical stylized features of financial data and the empirical results in analyzing high-frequency financial data from Frankfurt Stock Exchange confirm that SOWDA significantly (based on the RMSE comparison) improves the performance of classical econometric models after denoising the data with the discrete wavelet transform (DWT) and maximal overlap discrete wavelet transform (MODWT) methods. Keywords: Data denoising, DWT, High-frequency data, MODWT, Wavelet Corresponding author: Edward W. Sun, KEDGE Business School, 680 Cours de la Libèration, Talance Cedex, France. edward.sun@bem.edu. 1

2 1 Introduction The financial service sector is typically an expert at churning enormous amounts of data, gathered at the tick-by-tick level from customer interactions and transactions. Such ultra-high frequency data have a complex structure of irregularities and roughness that have been described as exhibiting multifractal phenomena - that is, different fragments of the data have different fractal properties (see Mandelbrot (1982)). The heterogeneity characterized by multifractal phenomena is caused by a large number of instantaneous changes in the markets and trading noises (see Sun et al. (2007) and references therein). Mining high-frequency data turns out to be more fundamental for financial informatics and stimulates interest in intelligently extracting information conveyed in such data. For example, McCulloch and Tsay (2004) discuss the non-linear behavior of high-frequency data and propose a price change duration model for analyzing price changes, Kelly and Steigerwald (2004) apply the stochastic volatility model to replicate features of high-frequency data in their market microstructure analysis, and Sun et al. (2009) present a high-frequency Value-at-Risk measure based on Lèvy processes. The wavelet method is one of the multifractal spectrum computing methods and is proven to be a reliable tool in econometric analysis (see, for example, Fan and Gençay (2010), Fan and Wang (2007), and Hong and Kao (2004)). Particularly, it is suitable for time series analysis, like in smoothing, denoising, and jump detection (see, for example, Gençay et al. (2010), In et al. (2011), Donoho and Johnstone (1998), and Sun and Meinl (2012), among others). Ramsey (2002) highlights some research areas where wavelet analysis might be applied in economics, and Crowley (2007) provides a survey about how wavelet methods have been used in the economics and finance literature. The advantage of the wavelet method is that it performs a multiresolution analysis - that is, it allows us to analyze the data at different scales (each one associated with a particular frequency passband) at the same time. In this way, wavelets can identify single events truncated in one frequency range as well as coherent structures across different scales. Several studies have applied wavelet methods in mining financial data. For example, Ramsey and Lampart (1998) and Kim and In (2008) apply wavelets to analyze relationships and dependencies among key macroeconomic and financial variables. Gençay et al. (2003) propose a method based on a wavelet multiscaling approach for decomposing time series data. Gençay et al. (2005) introduce a method to estimate the systematic risk of an asset based on a wavelet multiscaling approach of decomposing the underlying time series on a scale-by-scale basis. Lada and Wilson (2006) develop a wavelet-based spectral method for steady-state simulation analysis. Esteban-Bravo and Vidal-Sanz (2007) propose a wavelet-based method for solving boundary value problems in growth models. Jensen (2007) implements compactly supported wavelets to develop an estimator of the long memory process. Laukaitis (2008) applies wavelet transforms for high-frequency data denoising in the study of credit card intraday cash flow and intensity of transactions. Gençay and Gradojevic (2011) introduce a wavelet approach to estimate the parameters of a linear regression 2

3 model. Sun et al. (2011) propose a wavelet method for analyzing the currency market with high-frequency data. Aguiar-Conraria et al. (2012) apply wavelet tools in macroeconomics data analysis. Haven et al. (2012) show the efficiency of the wavelet method in denoising option price data. A classic assumption for data mining is that the data are generated by certain systematic patterns plus random noise. Denoising high-frequency data provides a fundamental tool to extract the systematic patterns conveyed in the data. As Sun and Meinl (2012) point out a specific problem arises when the trend component exhibits occasional jumps that are in contrast to the slow evolving long-term trend. These occasional jumps are often caused by, for example, unexpected large transactions or extreme prices and should not be contributed to the normal short-term variations (since jumps are often considered as noises), but indeed to the long-run trend. Traditional linear denoising methods (e.g., moving average) usually fail to capture this information accurately as these linear methods tend to blur out jumps, while non-linear filters are not appropriate to smooth out high-frequency fluctuations sufficiently since the trends extracted by these methods are not smooth enough (i.e., usually with kicks) to present long-run dynamic information (see Sun and Meinl (2012) and references therein). Several works have been done to overcome the above-mentioned problem. For example, Connor and Rossiter (2005) estimate the wavelet variance by using non-decimated wavelet transforms. Studies of applying wavelets in denoising data and coefficient construction can be found in Gençay et al. (2002), Keinert (2004), Mallat and Hwang (1992), and Percival and Walden (2006), among others. Among these methods, both DWT and MODWT need to decide the wavelet function, level of decomposition, and thresholding rule. A common approach in choosing the wavelet function is to use the shortest wavelet filter that can provide reasonable results (see Percival and Walden (2006)). The level of decomposition leads to the choice that considers the higher the better" in general. The thresholding rule, which is a function identifying the wavelet coefficients to be deleted, has also been investigated in academic research (see, for example, Gençay et al. (2002)). Meinl and Sun (2012) propose the local linear scaling approximation (LLSA) method in denoising high-frequency data and show its robustness in empirical application under statistical goodness-of-fit tests. However, LLSA focuses on the linear characteristics around jumps for the re-constructed wavelet coefficients, but only takes advantage of it when the wavelet function is pre-determined. The remaining challenge is how to determine the combination of the wavelet function, level of decomposition, and thresholding rule to reach an optimal smoothness that generally improves the performance of classic models after denoising the data. The algorithm (named SOWDA) proposed in this paper can optimally determine the wavelet function, level of decomposition, and thresholding rule by using smoothness as a regularization variable. The goal of our method is to denoise the data and obtain the trend that: (1) contains as much information as possible, (2) exhibits a certain degree of smoothness that can utilize the classic model, and (3) preserves as few artifacts (i.e., undesired structures, like oscillating nature, 3

4 generated through the denoising process) as possible. In our method, we define the measures for smoothness. Intuitively, these measures for smoothness describe the characteristics of the denoised data with wavelet transform that optimally provide output for further analysis with the classical model. We show that the resulting difference sequence between the denoised data and the original signal must converge in probability at a predetermined confidence level. This requires that: (1) the structural change (e.g., jumps) of the denoised data and the original signal should be synchronous, (2) there are no outliers in the denoised data, and (3) the local extremum in the denoised data should be bounded. We show the analytical properties of SOWDA that confirm the proposed method can lead to an optimal solution to satisfy these requirements. Therefore, SOWDA can be used to improve the performance of econometric models that are parametrically built on the i.i.d. white noise assumption by denoising the unexpected outliers. In addition, we propose a new performance evaluation method based on the jump detection test suggested by Xue et al. (2014). Through this procedure, we verify the robustness of SOWDA in its performance. We investigate the performance of SOWDA with numerical simulations that consider some typical patterns often observed in high-frequency financial data, e.g., excessive volatility and regime switching. With a comparison, our method results in a better performance than the alternative methods, the numerical results confirm the analytical properties of SOWDA that show the proposed algorithm maintains the original wavelet transform s computational complexity, and its approximation errors are bounded. In order to confirm the computational reliability and consistency that we have shown in the simulations, we further perform an empirical investigation by applying our algorithm with high-frequency data (5-minute data) of DAX 30 stocks. In the empirical study, we work both on the in-sample model fitting and out-of-sample (one-step ahead and two-step ahead) forecasting. The results we obtain from such a large sample investigation coincide with the previous simulation results. When using the denoised data generated by our algorithm for forecasting with the high-frequency data, we find that the performances (i.e., accuracy of forecasting) of the classic models e.g., AR, ARMA, and ARMA-GARCH, significantly improve (based on the RMSE comparison) and confirm the efficiency of the proposed algorithm. We organized the paper as follows. We describe the methodology in details and summarize it with an algorithm chart in Section 2. In Section 3, we show the implementation of SOWDA and its analytical properties with respect to the jumps in high frequency data and proposes a new performance measure procedure based on the jump detection test. Section 4 investigates the performance of SOWDA by conducting simulations. The simulation results confirm the superior performance of our method. In Section 5 we execute an empirical study by applying our method to analyze the high-frequency data collected from Frankfurt Stock Exchange (i.e., the DAX 30 stocks), and illustrate both in-sample modeling and out-of-sample forecasting results. We summarize our conclusions in Section 6. 4

5 2 The Methodology 2.1 Wavelets and wavelet transforms for denoising Wavelets are bases of L 2 (R) first developed to analyze geophysical signals (see Morlet (1983)). In contrast to the Fourier transformation they enable a localized time-frequency analysis with wavelet functions that usually have either compact support or decay exponentially fast to zero. In addition, wavelets provide a multiresolution of a signal that allows for analyzing the signal simultaneously on different (usually dyadic) scales. To achieve these desirable properties, the wavelet transformation is subject to the following conditions, i.e., the admissibility condition and orthonormality: ψ(ω) 2 dω <, ω ψ 2 (u) du = 1 and l=0 l=0 l=0 ψ(u) du = 0. The latter condition signifies that ψ oscillates around zero. An analog discrete formulation is developed by introducing discrete wavelet filter banks that correspond to a linear moving weighted average high-pass filter (see Mallat (1989)). For this, equivalent discrete oscillation and orthonormality conditions must hold: L 1 L 1 L 1 h l = 0, h 2 l = 1 and h l h l+2n = 0 for all n N, where h l = (h 0, h 1,, h L 1 ) is a finite length discrete wavelet filter. We note that these conditions are only necessary, but not sufficient to construct reasonable high-pass wavelet filters, requiring additional regularity conditions. Daubechies (1992) provides a thorough treatment of this topic. In our paper we focus on the application of wavelets and only consider the most commonly used wavelet functions in financial data analysis, e.g., Daubechies (D4) and the least asymmetric (LA) wavelet, which differ in their constructional regularity conditions and are the ones in their class with the smallest support (see Percival and Walden (2006) and Sun and Meinl (2012)). A discrete wavelet transform (DWT) is an orthogonal transform of a vector (discrete signal or time series data) X of length N (which must be a multiple of 2 J ) into J wavelet coefficient vectors W j R N/2j, 1 j J and one scaling coefficient vector V J R N/2J : [W 1,..., W J, V J ] = WX. (1) With the transformation matrix W determined by the wavelet filter banks h, it is more convenient to use the pyramid algorithm developed by Mallat (1989), with a computational complexity of O(N). Applying the inverse transform on these vectors yields: [W 1,..., W J, V J ]W = S J + 5 J D j = X, (2) j=1

6 that is, an additive decomposition of the original signal. When S J 1 = S J + J D j (3) holds, S j, approximations (i. e., moving weighted averages) of X at different dyadic scales, and D j, the detail vectors that are the details we lose at each approximation level, result in a multiscale decomposition. For each scale j we can separate the signal into high and low frequencies by a wavelet filter with the bandwidth determined by j. Maximal overlap discrete wavelet transform (MODWT) is an extension of the traditional DWT, which differs from DWT in the sense that all vectors are in R N and have a higher computational complexity of O(N log 2 N) (see, for example, Percival and Walden (2006) and references therein). For all discrete wavelet transforms in this paper we utilize circular filtering extensions at the borders, that is, for data points X 0, X 1,... and X N+1, X N+2,..., required for the moving weighted average, we substitute them by X N, X N 1,... with X 1, X 2,..., respectively. Percival and Walden (2006) provide a discussion on boundary extensions and distortions, as well as methods to deal with them. Wavelet denoising might result in different outcomes due to different choice of input variables. As we have pointed out, wavelets are oscillating functions and transform the signal orthogonally in the Hilbert space. Consequently, we do not have any trouble from reconstruction based on the wavelet transform. However, there are many wavelet functions that meet the requirement for the transform. In practice, different wavelets are used for different reasons. For example, the Haar wavelet has very small support, and wavelets of higher orders such as Daubechies (D4) and least asymmetric (LA) have bigger support. Bigger support can ensure a smoother shape of S j, 1 j J with each wavelet, and the scaling coefficient can carry more information due to the increased filter width. The optimal band-pass filter, which dictates the capacity to isolate features to specific frequency intervals is determined by the length of a wavelet function in approximation. In addition, the wavelet function must be able to mimic the features contained in the signal of interest in order to optimally represent the conveyed information. Therefore, the choice of the wavelet basis function turns out to be important when analyzing a given signal. After choosing the wavelet, we then decompose the signal into several levels. In general, we use the pyramid algorithm to accomplish this (see Gençay et al. (2002)). This algorithm decomposes the signal into detail and approximation coefficients in its first iteration. Each following iteration applies the same procedure on the approximation coefficients from the iteration one step ahead. 1 We can do this at a maximum log 2 (N) times (N is the number of observations). The quality of the denoising process varies with the number of iterations. The resulting outcome also critically depends on the thresholding rule that decides all wavelet coefficients less than a fixed constant in magnitude are zero. The coefficients obtained in each iteration are subject to the thresholding 1 For DWT, we down sample the coefficients, i. e., in each iteration we halve the number of detail coefficients. j= J 6

7 rule before we reconstruct the denoised signal. This rule identifies the coefficients that represent noise. We thus have three factors that influence the quality of denoising based on the wavelet transform: wavelet function (or mother wavelet), number of maximal iterations (or level of decomposition), and the thresholding rule. However, there is no straightforward method to determine these three factors simultaneously. In this paper we propose a new method that aims to optimally determine the choice of wavelet function, number of maximal iterations, and thresholding (we refer to the choice of them as a combination of denoising factors) simultaneously based on the smoothness-oriented criterion. 2.2 Smoothness-oriented wavelet denoising algorithm (SOWDA) In this section we introduce the method that helps us to decide the combination of denoising factors. Assume that the observed data X can be decomposed as follows: X t = S t + N t, where S t is the true signal and S t is its estimation. Here, N t is the additive noise sampled at time t. In order to evaluate the denoising performance, i.e., to see how close S t is toward S t, we define the smoothness properties as follows. Definition 1. Let x n be a random variable and x n = S t S t. If there exist constants c and ε, then the smoothness is, when ε > 0: lim Pr ( x n c > ε ) = 0. n This definition of smoothness intuitively states that the sequence x n of the difference between S t and S t becomes close to a controllable constant c. In other words, the resulting difference sequence between S t and S t converges in probability to c. This requires (i) the structural change (e.g., jumps) between S t and S t is synchronous, (ii) there are no outliers in x n, and (iii) the local extremum in x n is bounded and leads to the following measures. Definition 2. Let (Y 1, Y 2 ) T be a vector of continuous random variables with marginal distribution functions F 1, F 2, then the coefficient η H is: ( ) η H (u) = lim P Y 2 > F2 1 (u) Y 1 > F1 1 (u), (4) u 1 and the coefficient η L is ( ) η L (u) = lim P Y 2 < F2 1 (u) Y 1 < F1 1 (u). (5) u 0 When ε > 0, u 0, u 0 > u we have: η H (u) η L (u) 1 < ε. We then say Y 1 and Y 2 are synchronous, that is, η H (u) η L (u). 7

8 When η H > 0, there exists upper tail synchronicity and the positive extreme values in Y 1 and Y 2 can be observed simultaneously. When η L > 0, there exists lower tail synchronicity, and the negative extreme values can be observed simultaneously. We further require the smoothness measure to be able to detect artifacts and jumps. We suggest two different measures here: one considers artifacts (τ 1, based on an outliers test) and the other considers jumps (τ 2, based on local extrema). In other words, we use τ 1 to detect the global extrema and τ 2 the local extrema. Both of them have the ability to detect boundary problems, that is, inefficient approximation at the beginning and end of the signal. We suggest to apply Grubbs test for identifying artifacts, which is an iterative test for outliers based on an approximately normal distributed sample (see Grubbs (1969)). Let µ = 1 T T 1 X t be the sample mean of vector X t and s 2 = 1 T 1 T 1 (X t µ) 2 its sample variance. The test statistics are then given by G = max X i µ. s Here, G can be assumed to be t-distributed, and a test for outliers with significant level α (e.g. α = 0.05) can easily be performed by rejecting the null hypothesis of no outliers if: G > z α = T 1 t2 α,t 2 2 T T T 2 + t 2 α 2 T,T 2. When an outlier (i.e., the global extremum) is detected, it is removed from the data and the test then proceeds. As a measure of the amount of artifacts (or jumps of high magnitude), we can identify the number of iterations to run the test until it confirms there is no outlier. We apply the test until g(x) = 0 and count the number of outliers as a measure of structure. Definition 3. Let C(x) be a function determining whether there is one outlier in vector X: { 1, if G > z α C(x) = 0, if otherwise We define τ 1 as: where n is the sample size. τ 1 = n 1 1 C(x)=1, (6) i=1 In order to control all structural changes to be bounded, our proposed method investigates the local extrema (maxima or minima, respectively) at a certain magnitude. In order to avoid redundant computation (since τ 1 controls the outlier detection), we only run the test procedure for the output data after the wavelet transform. The local extrema referred to here are the largest and smallest values that a function takes at a point within a given neighborhood. 8

9 Definition 4. If there exists a Λ and Λ R, so that lim sup x n = Λ, n N and there exists a subsequence x kn of x n for which we have that x kn < Λ, n, then Λ is the local maxima. If there exists a λ and λ R, so that lim inf n N x n = λ, and there exists a subsequence x kn of x n for which we have that x kn > λ, n, then λ is the local minima. Definition 5. Let D(x) be a function that detects local maxima { 1, if x i Λ D(x) = 0, if otherwise, and D (x) detects local minima. D (x) = { 1, if x i λ 0, if otherwise. We define τ 2 as: where n is the sample size. τ 2 = n 1 1 D(x)=1 + i=1 n 1 1 D (x)=1, (7) i=1 The algorithm in this paper applies a linear score function T ( ) of τ 1 and τ 2 (i.e., a linear combination of measures for global and local extrema) to compute the overall score: T (τ 1, τ 2 ) = α τ 1 + β τ 2, (8) where α and β are the weights assigned to the score function, e.g., α + β = 1, which can be determined based on the data characteristics. The combination of denoising factors with the lowest value of T (τ 1, τ 2 ) is preferred and identified as the optimal set for the wavelet-based denoising process Summary of SOWDA algorithm Given a set of wavelet functions (f ), maximal levels of decomposition (l ), and thresholding rules (s ) as denoising factors, and an input data vector X t as input variables, SOWDA determines the output of the wavelet transform. We define a score function T (τ 1, τ 2 ) as the criterion to determine the combination of denoising factors. The function min{t (τ 1, τ 2 )} will lead to the optimal combination of denoising factors (f, l and s ) when its score value is minimized. The pseudo code for SOWDA is summarized in the following Algorithm Chart. 9

10 Algorithm 1 Smoothness-oriented wavelet denoising algorithm (SOWDA) Require: X: input data vector; F : a set of wavelet functions; L: a set of maximal levels of decomposition; S: a set of thresholding rules; W: wavelet transform {DWT, MODWT}; C : F L S W; Ensure: optimal f, l, s ; ˆX = W(f, l, s ) 1: for C c do 2: Conduct wavelet transform. 3: Apply threshold rule to wavelet coefficients. 4: Conduct wavelet inverse transform. 5: Compute evaluation function and return T (c). 6: end for 7: (W, f, l, s ) = c = argmin T (c). 8: Apply W and using (f, l, s ) to extract trend of X, ˆX. 3 Implementation of SOWDA 3.1 Jumps in high-frequency data A substantial amount of significant discontinuity has been observed in financial data and these discontinuities are commonly called jumps. Several empirical and theoretical studies have discussed the existence of jumps and their substantial impact on asset pricing, portfolio and risk management, and hedging (see Lee and Mykland (2008)). Jumps come to markets irregularly and their arrivals and amplitudes depend on market information that leads to the roughness of the financial time series data, particularly for high-frequency data (see Sun et al. (2007)). We need to detect jumps with robust tools, such as the wavelet method (see Sun and Meinl (2012) and Xue et al. (2014)), and then clarify which information is dynamically related to jumps in order to discover market phenomena and improve modeling, forecasting, pricing, and hedging. In finance, a one-dimensional asset return process with a fixed complete probability space (Ω, F t, P), where {F t : t [0, T ]} is a right-continuous information filtration for market participants, and P is a data-generating measure. Letting S(t) be the continuously compounded return of the asset at t under P, S(t) can be expressed as the Itô drift-diffusion process that satisfies the stochastic differential equation: ds(t) = µ(t)dt + σ(t)db(t), where B(t), µ(t), and σ(t) are the F t -adapted standard Brownian motion, the drift, and the spot volatility, respectively. When there are jumps: ds(t) = µ(t)dt + σ(t)db(t) + λ(t)dj(t), where J(t) is a counting process that is independent of W (t), and λ(t) is the jump size that is independent and identically distributed. 10

11 Lee and Mykland (2008) impose two necessary assumptions on price processes given as follows. For any ɛ > 0, 0 = t 0 < t 1,..., < t n = T : and sup t sup µ(u) µ(t i ) = O p ( t 1 2 ɛ ), t i u t i+1 sup t sup σ(u) σ(t i ) = O p ( t 1 2 ɛ ). t i u t i+1 The notation O p is used for random vector {X n } and non-negative random variable {x n }, X n = O p (x n ), whereby if for any ɛ > 0 there exists a finite constant C ɛ such that P ( X n > C ɛ x n ) < ɛ. These two assumptions show the drift and diffusion coefficients do not change dramatically over a finer time interval and the maximum changes in mean and spot volatility in a given interval is bounded (see Lee and Mykland (2008)). Several jump detecting methods based on wavelet transform have been proposed (see, for example, Sun and Meinl (2012) and Xue et al. (2014)). 3.2 Property of SOWDA The SOWDA algorithm we propose in this paper is exactly trying to remove the jumps without jeopardizing the information possessed in the original data. Let (F, L) be fixed, and we can directly obtain S SOWDA J = S DWT J and S SOWDA J = S MODWT J. For any other choice of (F, L, S), more details will be added to the estimator SJ SOWDA, according to T (τ 1, τ 2 ). After the reconstruction (see Equation (3)), these additional details correspond to the estimators S DWT J and SJ MODWT. For the refinement process, several wavelet coefficients are is available, as this procedure causes the set to zero by S, and no exact description of SJ SOWDA information carried by the wavelet coefficients at different levels to be intermixed. However, noting that either the DWT or MODWT approximation of any level is bounded by the signal itself, we can set up an ε-tube around the initially estimated trend by ε := max X t min X t. (9) t t We then estimate an upper bound for the error by a given constant c and c = N ε, which suffices the following equation: N t=1 ϑ t (X) S GOWDA J, t < c, (10) where X is the signal of length N and ϑ(x) is its denoised approximation. As we have: min X t SJ, SOWDA t max X t, and min X t SJ, MODWT t max X t, t t t t for all 1 j J, it follows that: ϑ t (X) S DWT J, t ε, and ϑ t (X) S MODWT J, t ε. 11

12 This must also hold for SOWDA, such that: min X t S SOWDA t J, t max t X t, and ϑ t (X) S SOWDA J, t ε. Assuming for fixed F and L, the wavelet coefficients Ω W j, k are ordered in time, that is, ω j, k+1 ω j, k, for all 1 j J and 1 k K 1. With t K := max t, j {t ΩS j, K} for all t > t K and the same ε as in Equation (9), we then obtain: N t=1 µ ( ϑ t (X) ) µ ( S SOWDA J, t ) = t K t=1 µ ( ϑ t (X) ) µ ( ) SJ, SOWDA t tk ε, and N t=1 σ ( ϑ t (X) ) σ ( S SOWDA J, t ) = t K t=1 σ ( ϑ t (X) ) σ ( ) SJ, SOWDA t tk ε. 3.3 Performance measure with jump detection Wavelet transforms have the ability to decompose the data into high-frequency components and low-frequency components. Since jumps are unexpected and instantaneous, Xue et al. (2014) indicate that jump dynamics should be contained in the high-frequency components of prices. They propose a non-parametric jump detection method, which estimates the high-frequency components of instantaneous volatility. Let the logarithm of an asset s market price be P t = log S t where S t is the asset price at time t. P 1,t are the wavelet coefficients of P t = log(s t ) for the first scale MODWT. A test statistic defined as J w, which detects jump occurrences at time t i for i = 3,..., T, is given as follows: J w (i) = P 1,t i ˆ σ ti 1 where ˆσ t 2 i 1 = 1 i 1 i 2 k=2 P i,t k P i,tk 1. When the test statistics exceeds the threshold, the null hypothesis of jumps in this interval is rejected. In this paper, we setup the threshold level as 1% of the null distribution. We define a jump as being detected when the test statistics exceeds the threshold level. We have learned that SOWDA is capable of identifying jumps and reconstructing them to preserve information, i.e., classify the spurious jumps and actual jumps. We propose a performance evaluation procedure for SOWDA based on the jump detection test (XGF test in short) introduced by Xue et al. (2014). The idea behind this procedure is straightforward. The identified jumps must be marked at the same location in the original data and denoised data. Our procedure is given as follows. 12

13 Step 1. Applying SOWDA to the original time series, then we separate the original data to the trend series and noise series. Step 2. As Xue et al. (2014) assume J w (i) following a normal distribution, we set the threshold level at 1% for testing the null hypothesis that there is no jump. Step 3. We run the XGF test following a non-overlapping moving window with length n to the original, the trend, and the noise. We mark 1 when the XGF test rejects the null hypothesis; otherwise 0 is marked. We then have three test series containing only 0 and 1 elements with a length of N/n, where N is the total data length and n << N. Step 4. We compute the difference series (e.g., the test series of the original minus the test series of the trend) of three series we obtained by Step 3. We next take the absolute value of them. Step 5. We run a T test to check if the mean of the absolute difference series obtained by Step 4 is different from 1. We reject the null hypothesis (i.e., equal to 1 ) with statistical significance when the jumps occur simultaneously in the two series investigated. Following Xue et al. (2014), we conduct this procedure respectively for positive jumps and negative jumps. If SOWDA efficiently decomposes the original time series, then we reject the null hypothesis for testing the difference series of original data and trend data, but cannot reject the null hypothesis for the difference series of the original data and noise data. 4 Simulation Study We conduct a simulation study to investigate the performance of the proposed algorithm. The purpose of this simulation study is two-fold. First, we show that for any arbitrary signal the proposed method will result in a better performance than the non-optimized method (i.e., arbitrarily determining the union of wavelet, level of decomposition, and the thresholding rule). Second, we shall illustrate the properties of our algorithm by, particularly, showing the consistency of our algorithm - that is, the error generated by our method is bounded and less than those of the non-optimized method. 4.1 The data In this study we perform the Monte Carlo simulations where errors (jumps) are generated from three different patterns to describe (1) moderate volatility, (2) excessive volatility, and (3) excessive volatility with mean level shifts. We create time series data of length This trend is based on a sine function, whose amplitude and frequency are drawn from a uniform distribution. For generating the pattern (1) signals, following the simulation introduced by Sun and Meinl (2012), we add jumps to this trend. Jump occurrences are uniformly distributed (coinciding 13

14 with a Poisson arrival rate observed in many systems), with the jump heights being a random number drawn from a normal distribution with mean 0 and variance 1. The signal is constant between the jumps. White Gaussian noise is added to the signal afterwards. For pattern (2) signals, we repeat the method used for simulating pattern (1), but replace the Gaussian noise with skewed contaminated normal noise, which has heavy tails to capture the excessive volatility (see Chun et al. (2012)). For pattern (3) signals, we repeat the method used for pattern (2) signals, but shift the trend up and down once in order to generate a signal characterized as excessive volatility with mean level shifts. The amplitude of the shift is four times the previous trend. Figure 1 illustrates the Q-Q plots of these three different signals. 4.2 The methodology In our simulation study we choose Haar, Daubechies (DB), Symlet (LA) and Coiflets (Coif) as wavelet functions (see Percival and Walden (2006)). We apply the pyramid algorithm in our empirical study. With each iteration of the pyramid algorithm, we increase the scaling level, that is, given a signal of length N = 2 J, the j-th iteration computes detail coefficients associated with changes on a scale of length λ j = 2 j 1 (see Gençay et al. (2002)). In this simulation we consider several thresholding rules. Donoho and Johnstone (1994) suggest the universal thresholding rule based on the following equation: ϑ U = ˆσ ɛ 2 log n. The idea behind this selection rule is that, for a sequence of n independently and identical distributed (i.i.d.) N(0, σ 2 ɛ ) random variables, it holds that the probability that the largest value in absolute terms is smaller than ˆσ ɛ 2 log n converges to one for large n. Donoho and Johnstone (1998) suggest the minimax. Donoho (1994) proposes a method based on minimizing Steins unbiased estimate of risk (i.e., SURE). Suppose there is a sequence of k i.i.d. random variables z i N(µ i, 1). Let ˆµ be an estimator of µ = (µ 1,..., µ k ). Given a weakly differentiable function g, an unbiased estimator of µ, ˆµ = z + g(z) is given by: where g(z) = i E ˆµ µ 2 = k + [E g(z) g(z)], i g i (z). With soft thresholding we obtain: SURE(z, ϑ) = k 2 #(i : z i ϑ + k min 2 ( z i, ϑ). Here, #S equals the cardinality of a given set S. The SURE thresholding is the one minimizing the estimated risk: i=1 ϑ S = arg min ϑ 0 SURE(z, ϑ). Donoho and Johnstone (1995) suggest the heuristic thresholding that applies the SURE thresholding rule to some levels of decomposition and universal thresholding to others. The 14

15 decision for which rule will be used on which level is made heuristically. Birgé and Massart (1998) suggest a thresholding rule based on the Birgé-Massart strategy using a penalized projection estimator (PPE). For each level i, q i is calculated as: q i = m (j + 2 i) α, where j is the maximal level of decomposition, m is a constant proposed to equal the length of the data (i.e., the number of observations), and α is a controlling constant. On each level i the q i largest coefficients are kept. The larger the α value is, the more coefficients that remain. A typical choice for the α value is 1.5 for compression and 3 for denoising. Based on the above-mentioned thresholding rules, wavelet coefficients are thresholded term by term on the basis of their individual magnitudes. Information on other coefficients has no influence on the treatment of particular coefficients. Cai and Silverman (2001) propose a block thresholding method, which is a shrinkage method capturing information on neighboring coefficients. When applying the block thresholding rule, the coefficients are considered in overlapping blocks and the treatment of coefficients in the middle of each block depends on the data in the whole block. The candidates for denoising factors of SOWDA in our simulation are: F { Haar, DB(2), DB(4), DB(8), LA(2), LA(4), LA(8), Coif(4), Coif(6), Coif(8)}; L {i : i = 1, 2, 3}; S { Block, Universal, SURE, heuristic Sure, Minimax, Birgé-Massart}. The linear score function T ( ) of τ 1 and τ 2 has the following form: T (τ 1, τ 2 ) = 0.5 τ τ 2. When we compute τ 1, we set α = 0.05 for the Grubbs t-statistic. When we compute τ 2, the one-sigma rule is applied to detect the local extrema - that is, an observation is considered to be the local extrema (of a given sequence) if it lies in the region at a distance from its mathematical expectation of more than the standard deviation. In this simulation the alternative methods we compare with SOWDA are five single wavelet functions, i.e., Haar, DB(4), DB(8), LA(8), and Coif(6), that work with both DWT and MODWT. We run the simulation for the three different data patterns described in Section 3.1. We use our algorithm to identify the best denoising method that optimally combines the wavelet function, level of decomposition, and the thresholding rule. For each pattern, we conduct the simulation based on a moving window design illustrated with Figure 2. We investigate our 15

16 algorithm for both an in-sample approximation and out-of-sample forecasting. For the out-ofsample forecasting, we work with one-step and two-step forecasting. Since the true trend (for both in-sample and out-of-sample) of the simulated stylized data is known, we then use SOWDA and alternative methods to denoise the simulated data and compare the approximated trend and forecasted trend with their true counterparts. Obviously, the smaller the difference is compared with the true trend, the better the performance will be by the underlying algorithm. 4.3 Simulation results As we mentioned in Section 3.1, the data length is 8,192 (2 13 ) for each pattern. For the moving window design, we set the in-sample size as 1,000 and the out-of-sample size as 10 for both one-step ahead and two-step ahead forecasting. The number of window moves is then 720, and we generate 100 data series for each pattern. Therefore, for each pattern we test our algorithm 72,000 times for in-sample approximation, one-step forecasting (validation), and two-step forecasting. In our simulation, we have 216,000 runs in total. For each run we compute the root mean squared error (RMSE). We report the mean value of RMSE and its corresponding variance (in parenthesis) for each method in Table 1. The smaller the MSE mean value is, the better the denoising performance. For DWT, we find that the RMSE mean values of SOWDA are smaller than that of other alternatives for both in-sample and out-of-sample performances. We note that the proposed SOWDA performs better than the non-optimized alternative methods. In order to illustrate the result reported in Table 1, we show some results (the best three) by Figures 3 and 4. We see that the mean values of RMSE of SOWDA are smaller than that of the alternative methods and the variance of RMSE for SOWDA turns out to be smaller than that obtained by using alternative methods. Additionally, we identify that when increasing the number of simulation runs, the variance of RMSE decreases. The speed of the decrease in variance (i.e., the speed of error convergence to its limit) of SOWDA is relatively faster than that of alternative methods. This result confirms the consistency property of SOWDA that we have analytically shown in Section The results we obtain in this simulation conclude that SOWDA shows better performance than alternative methods. 5 Empirical Study In this empirical study we investigate the performance of our proposed algorithm (SOWDA) by analyzing high-frequency data of German DAX 30 component stocks. We want to see if SOWDA can be used as a denoising method to improve the performance of some classic econometric models (i.e., AR, ARMA, and ARMA-GARCH) for in-sample estimation and out-of-sample forecasting. 16

17 5.1 The data The analysis is performed on high-frequency German DAX 30 component stock prices from January to December We aggregate the tick-by-tick data to the homogeneous (i. e., equally spaced) time series data at the 5-minute level by using the linear interpolation method - that is, the inhomogeneous series where times t i are given by x(t i ), while the target homogeneous time series x shall be defined at times τ j := t 0 + j t, j N, with t > 0 fixed. Every regular τ j is bounded by two times of irregularly spaced series, i.e.: t Ij τ j < t Ij +1, with I j := max{i t i τ j }, and data point τ j is interpolated between t Ij and t Ij +1 by: x(τ j ) = x Ij + τ j t Ij t Ij +1 t Ij (x Ij +1 x Ij ). In our sample, the 5-minute data contains 26,686 data points for each DAX stock in Methodology and results In this empirical study we have two experiments: the in-sample modeling and out-of-sample forecasting. For the in-sample training experiment, we first apply the smoothness-oriented wavelet denoising algorithm (SOWDA) proposed in this paper (see Algorithm chart 1 in Section 2.2.1) to denoise the data. We then use both the original data and denoised data to estimate the AR(2), ARMA(2,1), and ARMA(2,1)-GARCH(1,1) models with the maximum likelihood approach. We compare the performance of model fitting by using the root mean squared error (RMSE) for each model as the goodness-of-fit measure. For the out-of-sample forecasting, we use the established model to conduct one-step-ahead and two-step ahead forecasting following the method suggested by Sun and Meinl (2012). We then evaluate the forecasting performance by computing the root mean square error (RMSE) of the forecasted values compared with the observed values in the original data. We use the same SOWDA setting and moving window design as in the simulation. For the moving window design, we set the in-sample size as one month and the out-of-sample size as one day for both one-step ahead and two-step ahead forecasting. Table 2 reports the in-sample results for AR(2), ARMA(2,1), and ARMA(2,1)-GARCH(1,1). We see that the mean, median, and variance of RMSE are all reduced after applying SOWDA denoising, showing a significant improvement of denoised data (for both DWT and MODWT SOWDA) over the original data for model fitting. Table 3 reports the results of out-of-sample forecasting for AR(2), ARMA(2,1), and ARMA(2,1)- GARCH(1,1) based on SOWDA DWT denoising. Table 4 shows the forecasting results based on 17

18 SOWDA MODWT denoising. For the out-of-sample forecasting results, we also identify a generally significant improvement provided by SOWDA denoised data versus the original data. The empirical results coincide with our previous simulation results when considering the robustness of SOWDA s denoising performance. We next conduct the performance evaluation proposed in Section 3.3. If SOWDA efficiently decomposes the trend and noise from the original data, then all the jumps possessed by the original data, the trend, and the noise should coincide - that is, the null hypothesis that the mean of the absolute difference series equals one should be rejected with the T test. In our assessment, we consider not only the overall jumps, but also the negative and positive jumps individually. We report our results in Table 5. From Table 5, we note that the null hypothesis that the mean of the absolute difference series equals one is rejected with significant T test statistics, illustrating that the jumps occur simultaneously in the original data, the trend, and the noise obtained with SOWDA denoising. Our result implies that SOWDA performs efficiently in the decomposition of the original data based on the jump detection test. 6 Conclusion When conducting econometric analysis of data with classic parametric models built on i.i.d. white noise assumption, once one encounters unexpectedly extreme observations, e.g., outliers, the performance of such a parametric model will be distorted. To overcome the challenging disturbance of outliers, one can usually either find a more robust way to build the model or remove the outliers. The former might complicate the situation when the outlier s pattern is unknown, while the latter could remove some information if such outliers are closely related to the underlying true dynamics. In order to optimally detect and remove the real noise from the underlying dynamics, wavelet methods have been suggested to denoise the data not only in economics, but also in other areas, such as electronic signal processing. Applying the wavelet denoising method requires that the wavelet function, level of decomposition, and thresholding rule be determined. We call it trinity of wavelet denoising operation. Identifying the optimal combination of wavelet function, level of decomposition, and thresholding rule challenges the efficiency of the classical methods of denoising data based on the wavelet transform (e.g., DWT and MODWT). An inefficient decomposition of the systematic pattern (the trend) and noise of the target data will tremendously reduce the efficiency and effectiveness of any decision support system. When working with highfrequency financial data, their irregularities and roughness reinforce the necessity of more efficient tools for data mining. In this paper we propose a new denoising method for high-frequency financial data, named the smoothness-oriented wavelet denoising algorithm (SOWDA), which optimally determines the combination of denoising factors (i.e., wavelet function, level of decomposition, and thresholding 18

19 rule) based on a smoothness-oriented score function that is designed for detecting global and local extremum. The method can be applied with the classic DWT or MODWT approach. When applying it to high-frequency financial data, this algorithm is able to preserve a smooth trend and effectively displays the noise since all information can be contained in the wavelet multiresolution decomposition optimally after specifying the level of smoothness in reconstructing wavelet coefficients. In this paper we analytically show SOWDA s properties and propose a new performance evaluation procedure based on the jump detection test. In order to show the efficient performance of SOWDA, we first conduct an experiment based on simulations. We consider three different stylized data patterns that are often observed in high-frequency financial data, such as heavy tails and regime switching. In all simulation settings that we have investigated, SOWDA illustrates its robustness independent of the input data. We then empirically show the potential application of SOWDA by fitting and forecasting real high-frequency financial data (aggregated at the 5-minute-frequency level) from German DAX 30 component stock prices with three classic econometric models. The results confirm our conclusion that SOWDA significantly (based on the RMSE comparison) improves the efficiency of data denoising based on both DWT and MODWT transforms for these classic econometric models. From the results herein, we conclude that SOWDA is a robust algorithm that enriches the class of intelligent denoising methods in data mining. The proposed algorithm is expected to provide a significant improvement to the accuracy of econometric models in modeling and forecasting highfrequency financial data. We also believe SOWDA can be applied on other data with unknown noise patterns. Since data quality is fundamental for intelligent decision making, performing econometric models efficiently and effectively on quality data will help us build more reliable decision making system. References Aguiar-Conraria, L., Martins, M., Soares, M., The yield curve and the macro-economy across time and frequencies. Journal of Economic Dynamics and Control 36, Birgé, L., Massart, P., Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4(3), Cai, T., Silverman, B., Incorporating information on neighboring coefficients into wavelet estimation. Sankhya: The Indian Journal of Statistics 63, Chun, S., Shapiro, A., Uryasev, S., Conditional value-at-risk and average value-at-risk: estimation and asymptotics. Operations Research 60(4), Connor, J., Rossiter, R., Wavelet transforms and commodity prices. Studies in Nonlinear Dynamics & Econometrics 9,

20 Crowley, P., A guide to wavelets for economists. Journal of Economic Surveys 21(2), Daubechies, I., Ten lectures on wavelets, volume 61 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA 118. Donoho, D., Asymptotic minimax risk for sup-norm loss: solution via optimal recovery. Probability Theory and Related Fields 99, Donoho, D., Johnstone, I., Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, Donoho, D., Johnstone, I., Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90(432), Donoho, D., Johnstone, I., Minimax estimation via wavelet shrinkage. Annals of Statistics 26(3), Esteban-Bravo, M., Vidal-Sanz, J., Computing continuous-time growth models with boundary conditions via wavelets. Journal of Economic Dynamics and Control 31, Fan, J., Wang, Y., Multi-scale jump and volatility analysis for high-frequency financial data. Journal of the American Statistical Association 102, Fan, Y., Gençay, R., Unit root tests with wavelets. Econometric Theory 26, Gençay, R., Gradojevic, N., Errors-in-variables estimation with wavelets. Journal of Statistical Computation and Simulation 81(11), Gençay, R., Gradojevic, N., Selcuk, F., Whitcher, B., Asymmetry of information flow between volatilities across time scales. Quantitative Finance 10, Gençay, R., Selçuc, F., Whitcher, B., Multiscale systematic risk. Journal of International Money and Finance 24, Gençay, R., Selçuk, F., Whitcher, B., An introduction to wavelets and other filtering methods in finance and economics. Academic Press. Gençay, R., Selçuk, F., Whitcher, B., Systematic risk and timescales. Quantitative Finance 3(2), Grubbs, F., Procedures for detecting outlying observations in samples. Technometrics 11, Haven, E., Liu, X., Shen, L., De-noising option prices with the wavelet method. European Journal of Operational Research 222(1), Hong, Y., Kao, C., Wavelet-based testing for serial correlation of unknown form in panel models. Econometrica 72, In, F., Kim, S., Gençay, R., Investment horizon effect on asset allocation between value and growth strategies. Economic Modelling 28,

21 Jensen, M., An alternative maximum likelihood estimator of long-memory processes using compactly supported wavelets. Journal of Economic Dynamics and Control 24, Keinert, F., Wavelets and Multiwavelets. Chapman & Hall/CRC. Kelly, D., Steigerwald, D., Private information and high-frequency stochastic volatility. Studies in Nonlinear Dynamics & Econometrics 8, Kim, S., In, F., The relationship between financial variables and real economic activity: Evidence from spectral and wavelet analysis. Studies in Nonlinear Dynamics and Econometrics 7, Lada, E., Wilson, J., A wavelet-based spectral procedure for steady-state simulation analysis. European Journal of Operational Research 174(3), Laukaitis, A., Functional data analysis for cash flow and transactions intensity continuous-time prediction using Hilbert-valued autoregressive processes. European Journal of Operational Research 185(3), Lee, S., Mykland, P., Jumps in financial markets: A new nonparametric test and jump dynamics. Review of Financial Studies 21, Mallat, S., A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, Mallat, S., Hwang, W., Singularity detection and processing with wavelets. IEEE Transactions on Information Theory 38, Mandelbrot, B., The Fractal Geometry of Nature. W. H. Freeman & Co. Ltd. McCulloch, R., Tsay, R., Nonlinearity in high-frequency financial data and hierarchical models. Studies in Nonlinear Dynamics & Econometrics 5, Morlet, J., Sampling theory and wave propagation. Issues in Acoustic Signal/Image Processing and Recognition 1, Percival, D., Walden, A., Wavelet methods for time series analysis. Cambridge University Press. Ramsey, J., Wavelets in economics and finance: past and future. Studies in Nonlinear Dynamics and Econometrics 6, Ramsey, J., Lampart, C., The decomposition of economic relationships by time scale using wavelets: expenditure and income. Studies in Nonlinear Dynamics and Econometrics 3, Sun, E., Meinl, T., A new wavelet-based denoising algorithm for high-frequency financial data mining. European Journal of Operational Research 217, Sun, E., Rezania, O., Rachev, S., Fabozzi, F., Analysis of the intraday effects of economic releases on the currency market. Journal of International Money and Finance 30(4),

22 Sun, W., Rachev, S., Fabozzi, F., Fractals or I.I.D.: evidence of long-range dependence and heavy tailedness from modeling German equity market returns. Journal of Economics and Business 59, Sun, W., Rachev, S., Fabozzi, F., A new approach for using lèvy processes for determining high-frequency value-at-risk predictions. European Financial Management 15, Xue, Y., Gençay, R., Fagan, S., Jump detection with wavelets for high-frequency financial time series. Quantitative Finance 14,

23 Data pattern 1 Data pattern 2 Data pattern 3 Figure 1: Normal Q-Q plot for three different simulated data patterns. 23

24 Figure 2: Moving window design for the numerical studies. E is the length of the data used for training (approximation), V is the length for one-step ahead forecasting (validation), and F is the length for the two-step ahead forecasting. Given T as the total length of the data, the number of window moves is then floor((t E)/V )