Real-time Volatility Estimation Under Zero Intelligence. Jim Gatheral Celebrating Derivatives ING, Amsterdam June 1, 2006

Similar documents

Trading and Price Diffusion: Stock Market Modeling Using the Approach of Statistical Physics Ph.D. thesis statements. Supervisors: Dr.

Algorithmic Trading: Model of Execution Probability and Order Placement Strategy

Estimating correlation from high, low, opening and closing prices

Modeling the Implied Volatility Surface. Jim Gatheral Stanford Financial Mathematics Seminar February 28, 2003

Execution Costs. Post-trade reporting. December 17, 2008 Robert Almgren / Encyclopedia of Quantitative Finance Execution Costs 1

INDIAN INSTITUTE OF MANAGEMENT CALCUTTA WORKING PAPER SERIES. WPS No. 688/ November Realized Volatility and India VIX

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm

Financial Risk Management Exam Sample Questions/Answers

INTRADAY DYNAMICS AND BUY SELL ASYMMETRIES OF FINANCIAL MARKETS

Estimating Volatility

Standard Deviation Estimator

The execution puzzle: How and when to trade to minimize cost

Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies

An analysis of price impact function in order-driven markets

Financial Mathematics and Simulation MATH Spring 2011 Homework 2

High-frequency trading in a limit order book

Modelling Intraday Volatility in European Bond Market

Financial Econometrics and Volatility Models Introduction to High Frequency Data

The Heston Model. Hui Gong, UCL ucahgon/ May 6, 2014

THE ECONOMIC VALUE OF TRADING WITH REALIZED VOLATILITY

Calculating VaR. Capital Market Risk Advisors CMRA

Numerical Methods for Option Pricing

Back to the past: option pricing using realized volatility

Sales forecasting # 2

INCORPORATION OF LIQUIDITY RISKS INTO EQUITY PORTFOLIO RISK ESTIMATES. Dan dibartolomeo September 2010

Betting on Volatility: A Delta Hedging Approach. Liang Zhong

Stock Price Dynamics, Dividends and Option Prices with Volatility Feedback

Volatility Forecasting Performance: Evaluation of GARCH type volatility models on Nordic equity indices

An introduction to Value-at-Risk Learning Curve September 2003

ER Volatility Forecasting using GARCH models in R

Jumps and microstructure noise in stock price volatility

Basics of Statistical Machine Learning

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Analysis of Financial Time Series

Private Equity Fund Valuation and Systematic Risk

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

Trading activity as driven Poisson process: comparison with empirical data

Predict the Popularity of YouTube Videos Using Early View Data

2. Linear regression with multiple regressors

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

Time Series and Forecasting

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH

Uncertainty in Second Moments: Implications for Portfolio Allocation

Financial Markets. Itay Goldstein. Wharton School, University of Pennsylvania

INTEREST RATES AND FX MODELS

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Development and Usage of Short Term Signals in Order Execution

Forecasting methods applied to engineering management

Statistical Analysis of ETF Flows, Prices, and Premiums

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

Financial Market Efficiency and Its Implications

arxiv:physics/ v2 [physics.comp-ph] 9 Nov 2006

High Frequency Equity Pairs Trading: Transaction Costs, Speed of Execution and Patterns in Returns

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Hedging Options In The Incomplete Market With Stochastic Volatility. Rituparna Sen Sunday, Nov 15

Econometrics Simple Linear Regression

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Hedge Fund Index Replication - A Numerical Approach using Futures

Forecasting in supply chains

Finance 400 A. Penati - G. Pennacchi Market Micro-Structure: Notes on the Kyle Model

Forecasting in STATA: Tools and Tricks

Semi-Markov model for market microstructure and HF trading

How To Price Garch

Module 5: Multiple Regression Analysis

Centre for Central Banking Studies

16 : Demand Forecasting

Simple Regression Theory II 2010 Samuel L. Baker

LogNormal stock-price models in Exams MFE/3 and C/4

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Modelling the Scores of Premier League Football Matches

Implied volatility surface

Optimal order placement in a limit order book. Adrien de Larrard and Xin Guo. Laboratoire de Probabilités, Univ Paris VI & UC Berkeley

Algorithmic Trading Session 1 Introduction. Oliver Steinki, CFA, FRM

Exploratory Data Analysis

Financial Market Microstructure Theory

ARMA, GARCH and Related Option Pricing Method

Time Series Analysis and Forecasting

Financial TIme Series Analysis: Part II

Examples. David Ruppert. April 25, Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Exam Solutions. X t = µ + βt + A t,

Market Microstructure & Trading Universidade Federal de Santa Catarina Syllabus. rgencay@sfu.ca, Web: rgencay

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Autocovariance and Autocorrelation

PS 271B: Quantitative Methods II. Lecture Notes

ST 371 (IV): Discrete Random Variables

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

OUTLIER ANALYSIS. Data Mining 1

Chapter 5 Financial Forwards and Futures

The Best of Both Worlds:

Transcription:

Real-time Volatility Estimation Under Zero Intelligence Jim Gatheral Celebrating Derivatives ING, Amsterdam June 1, 2006

The opinions expressed in this presentation are those of the author alone, and do not necessarily reflect the views of of Merrill Lynch, its subsidiaries or affiliates.

Outline of this talk A non-intelligent model of the continuous double auction and its time series properties Uses of volatility forecasts Market microstructure bias A survey of estimation and forecasting algorithms Experimental results I gratefully acknowledge insightful comments from Roel Oomen. All errors are mine.

Motivation We often need to estimate in-sample volatility. Typically we need a good volatility estimate to reduce errors in estimating parameters of Market impact models Limit order fill models Volatility forecasting models: without good in-sample volatility estimates, how can we even assess the quality of volatility forecasts? From the theoretical perspective, we would like to understand the nature of market microstructure noise how to find an efficient estimator of integrated volatility

Uses of volatility forecasts Option valuation Risk estimation Order fill probability For the first two of these, we need to estimate the width of the distribution of relatively long-timescale returns For the order fill probability, we need a volatility to estimate the first passage time density Given that the underlying stochastic process is not Brownian motion, there is no a priori reason why the volatility numbers required for these quite different computations should be the same. In what follows, we will focus on measuring the width of the distribution of returns.

Our approach One way to measure the performance of various volatility estimators is to simulate data and see directly how they perform. This has been done before: Zhang, Mykland and Aït-Sahalia perform a numerical simulation of the Heston model with small additional iid Gaussian microstructure noise. Oomen computes results in closed-form in the context of a pure jump model again with an additional iid Gaussian microstructure noise. In each of these cases, there is a strong assumption on the form of the microstructure noise. In reality, we don t even believe in the artificial split between true price and noise processes. So, we proceed by simulating an artificial zero-intelligence market described and analyzed in detail by Smith, Farmer et al.

A non-intelligent (or zero-intelligence) model Consider a model where market orders arrive randomly at rate µ, limit orders (per price level) arrive at rate α and a proportion δ per unit time of existing limit orders is canceled. The behavior induced by such a model is rather complex and has been shown to mimic the behavior of real markets in some respects such as the distribution of spreads (Bollerslev et al. (1997)) and the relationship between volatility and volume (Farmer et al. (2005)). Such a model is termed non-intelligent because order flow is random. In a real market, intelligent agents respond to orders with counter orders of their own.

A sample path Trade price -80-60 -40-20 0 20 0 10000 20000 30000 40000 50000 Ticks

Parameters and sample properties Smith Farmer zero-intelligence parameters are µ =0.1 α =1 δ =0.2 Noise-to-signal ratio (see later for definition) γ =2.94. First order autocorrelation coefficient ρ 1 = 0.43 Parameters are chosen so that these last two are like real data.

Choice of sampling scheme Oomen has pointed out that it is important to sample in transaction time rather than in business time or calendar time. In transaction time, empirically observed returns are MA(1) In calendar time, with varying intensity, empirically observed returns are ARIMA. For our artificially-generated dataset where trade intensity is constant, it is most natural to sample each event, or in transaction time. Our simulated process turns out to be approximately MA(1) just like real stock returns.

Autocorrelation plot ACF -0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 Process is approximately MA(1) 0 10 20 30 40 Lag

How should we compute historical (realized) volatility? Given a set of tick data, how can we measure variance? The naïve answer would be to compute the statistic 1 T TX i ln(s i /S i 1 ) 2 where the S i are successive prices in the dataset. This estimator is known as Realized Variance (or RV).

Microstructure bias In the limit of very high sampling frequency, RV picks up only market microstructure noise. To see this, suppose that the observed price Y t is given by Y t = X t + ² t where X t isthevalueoftheunderlying(log-)priceprocessofinterestand² t is a random market microstructure-related noise term. Suppose there are n + 1 ticks (so there are n price changes) in the time interval T.Thenasn, [Y,Y ]:= TX i (Y i Y i 1 ) 2 [X, X]+2nVar[²] =n σ 2 +2nVar[²] whereweassumethattheper tick variance σ 2 is constant.

Oomen s noise-to-signal ratio Following Oomen, we define the noise-to-signal ratio γ = Var[²] σ 2 and note in passing that Var[²] may be efficiently estimated using Var[²] TX i (Y i+1 Y i )(Y i Y i 1 )

The conventional solution The conventional solution is to sample at most every five minutes or so. Suppose there are 78,000 trades per day (CSCO for example). Then there are roughly 1,000 trades every 5 minutes. Sampling only every 5 minutes corresponds to throwing out 99.9% of the points! To quote Zhang, Mykland and Aït-Sahalia, It is difficult to accept that throwing away data, especially in such quantities, can be an optimal solution. From a more practical perspective, if we believe that volatility is timevarying, it makes sense to try and measure it from recent data over the last few minutes rather than from a whole day of trading.

The multiple grid estimator Denote by Y jk (1 j k), the subsample of Y obtained by sampling every k ticks from the jth tick on. There are clearly k such non-overlapping subsamples in the dataset, the first one beginning with the firsttick,thesecondwiththe second tick and so on. Then, for each Y jk,wehave [Y jk,y jk ] [X jk,x jk ]+2n jk Var[²] Now define the average subsampled realized variance [Y,Y ] k := 1 k kx j [Y jk,y jk ] and suppose further that the per tick variance σ 2 is constant. Then [Y,Y ] k n k k σ 2 +2 n k Var[²] where n k is the average number of ticks in each subsample. We obtain the estimator v Realized k = [Y,Y ] k n k k

The ZMA estimator With our earlier assumption of iid noise, we can eliminate bias for each k by forming [Y,Y ] k n k n [Y,Y ] n k k σ 2 n k σ 2 Thus we obtain the Zhang-Mykland-Ait-Sahalia (ZMA) bias-corrected estimator of σ 2 : ½ vk ZMA 1 := [Y,Y ] k n ¾ k n k (k 1) n [Y,Y ]

The Zhou estimator Define [Y,Y ] Z := TX i (Y i Y i 1 ) 2 + TX i (Y i Y i 1 )(Y i 1 Y i 2 )+ TX i (Y i Y i 1 )(Y i+1 Y i ) It s easy to check that under the assumption of serially uncorrelated noise independent of returns, E h [Y,Y ] Zi = hxi T It follows that [Y jk,y jk ] Z (n jk 2) σ 2 As suggested by Zhou himself, we may compute his estimator from subsamples of the data obtaining vk Zhou := 1 n k 2 [Y,Y ]Z k

GARCH For each subsample Y 1k, we use MLE to estimate the parameters of the GARCH process σ 2 i = a 0 + a 1 r 2 i 1 + b 1 σ 2 i 1 We take as our variance estimate, the last estimated value of σ 2 σ 2 n k. Note that we graph a kernel-smoothed version of the GARCH estimator.

The drift-adjusted high-low-open estimator σ 2 n = 1 2 E " Ã! high log 2 open Ã!# low +log 2 12 " open E log Ã! Ã!# 2 high low +log open open Once again, this estimator may be computed for each subsample and averaged.

Discreteness adjustment for open-high-low estimator Suppose we have k ticks altogether and we sample every tick. Then if x is the log of the true maximum and x k isthelogofthediscretemaximum,we should have x = x k + δ k where δ k is a non-negative random variable. Then from Magdon-Ismail and Atiya (2003) (who in turn quote Rogers and Satchell (1991)), to leading order in the time δt between ticks, Ã 1 ασ := E [δ k ]= 2 π 4 + βσ 2 = E h i µ 1 δ 2 k 12 16 + π 2 1 6! σ 2 + o(δt) σ + o(δt) where σ is the volatility per tick.

Final formula adjusted for discreteness where σ q k 2 α 2 + β = A := 1 2 E h x 2 k + x e B := 1 2 E h x k + x e k C := E h x k x e k i i 2 k i 2 v u t A B + α 2 C 2 4(k 2 α 2 + β) + α C 2 k 2 α 2 + β

Effect of discreteness adjustment on Open-High-Low estimator (artificial normal random walk data) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Unadjusted Adjusted for discreteness True variance per tick is 1 0 200 400 600 800 1000

Measuring the true variance of ZI data We generate 10,000 paths of the process with 1,000 timesteps. For each time slice, we generate a distribution of returns and compute its variance.

Return distributions After 10 trades 0.00 0.05 0.10 0.15 0.20-5 0 5 After 100 trades -15-10 -5 0 5 10 15 0.00 0.02 0.04 0.06 0.08 0.10 0.12 After 500 trades 0.00 0.01 0.02 0.03 0.04 0.05 0.06-30 -20-10 0 10 20 30 After 1000 trades -30-20 -10 0 10 20 30 0.00 0.01 0.02 0.03 0.04

QQ plots After 10 trades After 100 trades Sample Quantiles -3-2 -1 0 1 2 3 4-4 -2 0 2 4 Theoretical Quantiles Sample Quantiles -4-2 0 2 4-4 -2 0 2 4 Theoretical Quantiles After 500 trades After 1000 trades Sample Quantiles -4-2 0 2 4 Sample Quantiles -3-2 -1 0 1 2 3-4 -2 0 2 4 Theoretical Quantiles -4-2 0 2 4 Theoretical Quantiles

True variance vs cumulative number of trades 0.0867 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Variance per tick 0.00 0.05 0.10 0.15 0.20 0 200 400 600 800 1000 0 200 400 600 800 1000 Time in ticks Time in ticks Variance per tick

Performance of various estimators on ZI data RV Variance estimate 0.00 0.02 0.04 0.06 0.08 0.10 0.12 True var = 0.0867 Zhou ZMA GARCH High-low-open 0 200 400 600 800 1000 Returns per sampling interval

Are the computations correct: a check All estimators except hi-loopen give an estimated variance per tick of one at the highest sampling frequency. The hi-lo-open estimator approaches one at about 100 sampling points per interval. Variance estimate 0.0 0.5 1.0 1.5 GARCH ZMA P N(0, 1) Realized Hi-Lo-Open Zhou True variance 0 200 400 600 800 1000 Returns per sampling interval

Detail of plot of Zhou and ZMA performance Variance estimate 0.080 0.082 0.084 0.086 0.088 0.090 0.092 0 5 10 15 20 Returns per sampling interval

Interim conclusion Both RV and GARCH are very biased at high frequency GARCH is seen to be very similar to RV From its definition, GARCH is like exponentially smoothed RV The high-low-open estimator is seen to underperform. The only estimators worth considering at are Zhou and ZMA. On this path, the Zhou estimator definitely performed better. It seems from the plot that The Zhou estimator may have lower bias but greater variance The ZMA estimator may have greater bias but lower variance So we take an experimental look at Mean Squared Error (MSE) for these estimators. MSE = Bias 2 + Variance(v)

Mean Squared Error (MSE) vs sampling frequency As we guessed, ZMA is more biased but Zhou has higher variance. For a sample size of 100 trades, the winner is ZMA with optimal sampling frequency of 7. MSE 0.00 0.02 0.04 0.06 0.08 0.10 100 ticks Optimal sampling frequency for ZMA is 7 An illiquid stock may trade around 100 times in 10 minutes. Optimal sampling frequency for Zhou is 4 2 4 6 8 10 Returns per sampling interval

Mean Squared Error (MSE) vs sampling frequency For a sample size of 1,000 trades, the winner is ZMA with optimal sampling frequency of 7. A liquid stock can have 1,000 trades in 5 minutes. MSE 0.000 0.005 0.010 0.015 0.020 0.025 0.030 1,000 ticks Optimal sampling frequency for ZMA is 7 Optimal sampling frequency for Zhou is 4 2 4 6 8 10 Returns per sampling interval

Mean Squared Error (MSE) vs sampling frequency For a sample size of 10,000 trades, the winner is Zhou with optimal sampling frequency of 4. A reasonably liquid stock can have 1,000 trades in one day. MSE 0.000 0.005 0.010 0.015 10,000 ticks Optimal sampling frequency for ZMA is 10 Optimal sampling frequency for Zhou is 4 5 10 15 20 Returns per sampling interval

Mean Squared Error (MSE) vs sampling frequency For a sample size of 50,000 trades, the winner is Zhou with optimal sampling frequency of 4. A liquid stock may have 50,000 trades in one day. MSE 0.000 0.002 0.004 0.006 0.008 0.010 50,000 ticks Optimal sampling frequency for ZMA is 16 Optimal sampling frequency for Zhou is 4 5 10 15 20 Returns per sampling interval

Plot of Zhou and ZMA performance with error bars Variance estimate +/- Sqrt(MSE) 0.070 0.075 0.080 0.085 0.090 0.095 0.100 0 5 10 15 20 Returns per sampling interval

Now we analyze real data

Oomen s data cleaning rule We eliminate trades with non-blank condition codes; this has the side-benefit of eliminating all trades after the close. We further adopt Oomen s cleaning procedure where outliers are removed if there is an instantaneous price reversal as follows. Let P k denote the logarithmic price at which the kth transaction is executed. For P k to be removed by the filter, the following conditions need to be satisfied: P k P k 1 >c and P k+1 P k [ 1 w, 1+w] for0<w<1 Again following Oomen,we set w =0.25 and c =8σ where σ is some reasonable estimate of standard deviation of tick-by-tick returns (the sample standarddeviationinthiscase).

Data Average number of trades over the 69-day period from 03-Jan-2006 to 07-Apr-2006 is as follows: IBM: 9,227 CSCO: 72,471 PFE: 25,515 COCO: 3,610 After cleaning, number of trades in these stocks on 06-Apr-2006 was: IBM: 6,729 CSCO: 81,483 PFE: 18,943 COCO: 2,975 So Thursday 06-Apr-2006 was a typical day.

Comparison of estimates for PFE (06-Apr-2006) Annualized volatility estimate 0.00 0.05 0.10 0.15 0.20 ZMA(7) = 15.27% Zhou(4) = 13.18% 0 20 40 60 80 100 Returns per sampling interval

Comparison of estimates for CSCO (06-Apr-2006) Annualized volatility estimate 0.0 0.1 0.2 0.3 0.4 ZMA(7) = 25.69% Zhou(4) = 22.36% 0 20 40 60 80 100 Returns per sampling interval

Comparison of estimates for IBM (06-Apr-2006) Annualized volatility estimate 0.00 0.05 0.10 0.15 0.20 ZMA(7) = 12.52% Zhou(4) = 12.20% 0 20 40 60 80 100 Returns per sampling interval

Comparison of estimates for COCO (06-Apr-2006) Annualized volatility estimate 0.0 0.1 0.2 0.3 0.4 Zhou(4) = 25.43% ZMA(7) = 27.69% 0 20 40 60 80 100 Returns per sampling interval

Summary of results The following table compares estimated and implied volatilities (from Bloomberg) as of 06-Apr-2006 for the four stocks studied: σ ZMA σ Zhou σ Implied PFE 15.27% 13.18% 19.90% CSCO 25.69% 22.36% 30.50% IBM 12.52% 12.20% 16.41 % COCO 27.69 % 25.43% 41.92 % The bias might in large part be explained by accounting for overnight moves. Note also that the ZMA estimate is always higher than the Zhou estimate.

PFE autocorrelation function ACF -0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 Lag

CSCO autocorrelation function ACF -0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 Lag

IBM autocorrelation function ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 Lag

COCO autocorrelation function ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 35 Lag

Noise-to-signal ratios (γ) and first order autocorrelations γ γ ρ 1 ρ 1 PFE 4.17 1.73-0.40-0.40 CSCO 2.63 2.22-0.38-0.34 IBM 0.85 0.18-0.29-0.12 COCO 0.30 0.35-0.13-0.17 ZI data 2.94-0.43 The starred quantities come from restricting the analysis to data from the primary exchange. Note the massive reduction in noise on NYSE. The combined impacts of Archipelago and regulation NMS should change that!

Samples of 50 successive trades Note autocorrelation of order flow in real markets. No such correlation in the zero-intelligence market. However, plots look qualitatively similar. 0 10 20 30 40 50 21.600 21.605 21.610 21.615 21.620 CSCO 25.050 25.054 25.058 PFE 0 10 20 30 40 50 IBM ZI data 84.16 84.18 84.20 84.22 84.24 84.26 0 10 20 30 40 50 0 1 2 3 4 5 0 10 20 30 40 50

Recommendations Under zero intelligence, when the sample consists of a large number of ticks, the Zhou estimator performs better and the optimal sampling frequency is 4. when the sample consists of a small number of ticks (1,000 or less), the ZMA estimator performs best. However, closer inspection of the results of applying Zhou and ZMA to the real data suggest that better results are obtained from Zhou in all cases See for example the smaller difference between Zhou and RV estimated at lower sampling frequency in all cases. The probable implication is that the zero-intelligence model is underestimating microstructure bias in the estimators (though zerointelligence still does better than assuming iid, serially uncorrelated noise). Specifically, in the real world, order flow signs are highly autocorrelated; they are uncorrelated in the zero-intelligence world. We thus recommend using Zhou with a sampling frequency of 4 in all cases. Finally, if for practical reasons, the high-low-open estimator must be used, it must at least be discreteness adjusted.

What about mid-quotes? The foregoing recommendations certainly apply if you only have trade data. What if you also have quote data? According to practitioners, using mid-quotes eliminates bid-ask bounce. If we use mid-quotes, do we get the same volatility estimate? When should we sample the mid-quotes? Every quote change? Every second? According to Bouchaud, Gefen, Potters and Wyart (2004) and also Bandi and Russell (2006), it s best to sample the mid-quote just before each trade. We now repeat our previous artificial market experiment, logging midquotes rather than trade prices.

Mid-quote based MSE vs sampling frequency For a sample size of 10,000 trades, minimal MSEs are almost identical for Zhou and ZMA. Optimal sampling frequency is every tick for Zhou. We note that minimal MSEs are similar to those from trade data. MSE 0.000 0.005 0.010 0.015 0.020 10,000 ticks Optimal sampling frequency for ZMA is 3 Optimal sampling frequency for Zhou is 1 2 4 6 8 10 Returns per sampling interval

Performance of estimators on ZI mid-quote data True var = 0.0867 RV Zhou ZMA High-low-open GARCH Variance estimate 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0 20 40 60 80 100 Returns per sampling interval

Mid-quote based estimates for PFE (06-Apr-2006) Annualized volatility estimate 0.00 0.05 0.10 0.15 0.20 ZMA(2) = 13.48% Zhou(1) = 13.48% 0 20 40 60 80 100 Returns per sampling interval

Mid-quote based estimates for CSCO (06-Apr-2006) Annualized volatility estimate 0.0 0.1 0.2 0.3 ZMA(2) = 20.91% Zhou(1) =20.91% 0 20 40 60 80 100 Returns per sampling interval

Mid-quote based estimates for IBM (06-Apr-2006) Annualized volatility estimate 0.00 0.05 0.10 0.15 0.20 ZMA(2) = 12.65% Zhou(1) = 12.65% 0 20 40 60 80 100 Returns per sampling interval

Mid-quote based estimates for COCO (06-Apr-2006) Annualized volatility estimate 0.0 0.1 0.2 0.3 0.4 Zhou(1) = 24.47% ZMA(2) = 24.47% 0 20 40 60 80 100 Returns per sampling interval

Summary of results The following table compares estimated and implied volatilities (from Bloomberg) as of 06-Apr-2006 for the four stocks studied and includes mid-quote based estimates: σ ZMA σ Zhou σ Implied σ ZMA Mid σ Zhou Mid PFE 15.27% 13.18% 19.90% 13.48% 13.48% CSCO 25.69% 22.36% 30.50% 20.91% 20.91% IBM 12.52% 12.20% 16.41 % 12.65% 12.65% COCO 27.69 % 25.43% 41.92% 24.47% 24.47% The mid-quote based estimates are all consistent with our prior trade-based estimates and with our previous guess that the Zhou estimator is closer to the truth.

Noise-to-signal ratios (γ) and first order autocorrelations with added mid-quoted based results γ γ γ Mid ρ 1 ρ 1 ρ 1 Mid PFE 4.17 1.73 0.02-0.40-0.40-0.02 CSCO 2.63 2.22 0.01-0.38-0.34-0.01 IBM 0.85 0.18 0.02-0.29-0.12-0.02 COCO 0.30 0.35-0.03-0.13-0.17 0.04 ZI data 2.94-0.43 We note that both noise and first order correlations in the mid-quotes are effectively zero.

Final conclusion We have shown that in the zero intelligence simulation, the various estimators of realized variance are also good estimators of the variance of the distribution of final returns. We also showed that the Zhou and ZMA give reasonable and consistent estimates of realized variance when applied to real data. We further showed that time series of real mid-quote changes exhibit effectively no noise or autocorrelation. The various estimators generate realized variance estimates consistent with those generated from time series of transaction price changes.

References Federico M. Bandi and Jeffery R. Russell (2006). Comment on JBES 2005 invited address by Peter R. Hansen and Asger Lunde. Journal of Business & Economic Statistics 24 (2), 167-173. T. Bollerslev, I. Domowitz and J. Wang (1994). Order-flow and the bid-ask spread: an empirical probability model of screen-based trading. Journal of Economic Dynamics and Control 21 (8-9), 1471-1491. Jean-Philippe Bouchaud, Yuval Gefen, Marc Potters, and Matthieu Wyart (2004) Fluctuations and response in financial markets: the subtle nature of random price changes. Quantitative Finance 4, 176-190. J. Doyne Farmer, Paolo Patelli and Ilija Zovko (2005). The predictive power of zero-intelligence in financial markets. Proceedings of the National Academy of Sciences of the USA 102(11), 2254-2259. Peter R. Hansen and Asger Lunde (2005). Realized variance and market microstructure noise. Available at SSRN: http://ssrn.com/abstract=506542. Roel C. A. Oomen (2005). Properties of bias-corrected realized variance under alternative sampling schemes. Journal of Financial Econometrics 3(4), 555-577.

More references Roel C. A. Oomen (2006). Comment on JBES 2005 invited address by Peter R. Hansen and Asger Lunde. Journal of Business & Economic Statistics 24 (2), 195-202. Eric Smith, J Doyne Farmer, Laszlo Gillemot and Supriya Krishnamurthy (2003). Statistical theory of the continuous double auction. Quantitative Finance 3, 481-514. Lan Zhang, Per A. Mykland and Yacine Aït-Sahalia (2005), A tale of two timescales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100(472), 1394-1411. B. Zhou (1996), High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14(1), 45-52.