Examples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

Similar documents

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm

Statistics Graduate Courses

PS 271B: Quantitative Methods II. Lecture Notes

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Sales forecasting # 2

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Regression III: Advanced Methods

Analysis and Computation for Finance Time Series - An Introduction

Examining a Fitted Logistic Model

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Simple Linear Regression Inference

Getting Correct Results from PROC REG

Univariate Regression

Logistic Regression (a type of Generalized Linear Model)

Time Series Analysis

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Time Series Analysis

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Cost of Capital and Corporate Refinancing Strategy: Optimization of Costs and Risks *

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic.

A Latent Variable Approach to Validate Credit Rating Systems using R

5. Linear Regression

Lecture 3: Linear methods for classification

Time Series Analysis

Simple linear regression

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Some Essential Statistics The Lure of Statistics

GLM I An Introduction to Generalized Linear Models

Statistical issues in the analysis of microarray data

4. Simple regression. QBUS6840 Predictive Analytics.

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH

How To Model A Series With Sas

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Statistics in Retail Finance. Chapter 6: Behavioural models

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

SUMAN DUVVURU STAT 567 PROJECT REPORT

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Introduction to General and Generalized Linear Models

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Statistical pitfalls in Solvency II Value-at-Risk models

TIME SERIES ANALYSIS

BayesX - Software for Bayesian Inference in Structured Additive Regression

2. Linear regression with multiple regressors

Forecasting methods applied to engineering management

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Software Review: ITSM 2000 Professional Version 6.0.

ITSM-R Reference Manual

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Lecture 14: GLM Estimation and Logistic Regression

Lecture 8: Gamma regression

Analysis of Financial Time Series

Nonlinear Regression:

Applications of R Software in Bayesian Data Analysis

VI. Real Business Cycles Models

ER Volatility Forecasting using GARCH models in R

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

Ch.3 Demand Forecasting.

ARMA, GARCH and Related Option Pricing Method

LOGNORMAL MODEL FOR STOCK PRICES

Java Modules for Time Series Analysis

Interaction between quantitative predictors

5. Multiple regression

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

1 Short Introduction to Time Series

Lecture 2: ARMA(p,q) models (part 3)

SAS Software to Fit the Generalized Linear Model

Studying Achievement

Imputing Values to Missing Data

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Multiple Linear Regression in Data Mining

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Time Series Analysis

Regression Modeling Strategies

Univariate and Multivariate Methods PEARSON. Addison Wesley

Estimating Volatility

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Least Squares Estimation

3. Regression & Exponential Smoothing

2. Simple Linear Regression

How To Understand The Theory Of Probability

How To Analyze The Time Varying And Asymmetric Dependence Of International Crude Oil Spot And Futures Price, Price, And Price Of Futures And Spot Price

Week 5: Multiple Linear Regression

Threshold Autoregressive Models in Finance: A Comparative Approach

Some useful concepts in univariate time series analysis

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Basic Statistical and Modeling Procedures Using SAS

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Forecasting in supply chains

Transcription:

Cornell University April 25, 2009

Outline 1 2 3 4

A little about myself BA and MA in mathematics PhD in statistics in 1977 taught in the statistics department at North Carolina for 10 years have been in Operations Research and Information (formerly Industrial) Engineering at Cornell since 1987

A little about myself starting teaching Statistics and Finance to undergraduates in 2001 textbook published in 2004 starting teaching Engineering to master s students in 2008 working on revised and expanded textbook now programming exclusively in R

Undergraduate Textbook

A little about my research have done research in asymptotic theory of splines semiparametric ing measurement error in regression smoothing (nonparametric regression and density estimation) transformation and weighting stochastic approximation biostatistics environmental engineering ing of term structure executive compensation and accounting fraud

Three types of regression Linear regression Y i = β 0 + β 1 X i,1 + + β p X i,p + ɛ i, i = 1,..., n regression Y i = m(x i,1,..., X i,p ; β 1,..., β q ) + ɛ i, i = 1,..., n where m is a known function depending on unknown parameters Nonparametric regression Y i = m(x i,1,..., X i,p ) + ɛ i, i = 1,..., n where m is an unknown smooth function

Usual assumptions on the noise Usually ɛ 1,..., ɛ n are assumed to be: mutually independent (or at least uncorrelated) homoscedastic (constant variance) normally distributed Much research over the last 50+ years has looked into ways of 1 checking these assumptions 2 statistical methods that require less assumptions

Transform-both-sides Ideal (no errors): Y i = f (X i, β) Statistical (first attempt): Y i = f (X i, β) + ɛ i where ɛ 1,..., ɛ n are iid Gaussian TBS : h{y i } = h{f (X i, β)} + ɛ i where ɛ 1,..., ɛ n are iid Gaussian h is an appropriate transformation

Estimation of Default Probabilities : ratings: 1=Aaa (best),...,16=b3 (worse) default frequency (estimate of default probability)

Some statistical s nonlinear : Pr(default rating) = exp{β 0 + β 1 rating} linear/transformation (in recent textbook): log{pr(default rating)} = β 0 + β 1 rating Problem: cannot take logs of default frequencies that are 0 (Sub-optimal) solution in textbook: throw out these observations

A better statistical Transform-both-sides (TBS) see Carroll and Ruppert (1984, 1988): using a power transformation: { } λ { Pr(default rating) + κ = exp(β0 + β 1 rating) + κ } λ λ chosen by residual plots (or maximum likelihood) λ = 1/2 works well for this example log transformations are also commonly used κ > 0 will shift data away from 0

The Box-Cox family the most common transformation family is due to Box and Cox (1964): derivative has simple form: h(y, λ) = yλ 1 if λ 0 λ = log(y) if λ = 0 h y (y, λ) = d dy h(y, λ) = yλ 1 for all λ

TBS fit compared to others log(default probability) 10 5 0 o BOW nonlinear tbs data 5 10 15 rating

regression residuals absolute residual 0.000 0.002 0.004 0.006 Theoretical Quantiles 2 1 0 1 2 Normal QQ plot 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.006 0.002 0.002 0.006 fitted values Sample Quantiles

TBS residuals absolute residual 0.005 0.010 0.015 0.020 Theoretical Quantiles 2 1 0 1 2 Normal QQ plot 0.02 0.02 0.06 0.10 0.02 0.01 0.00 0.01 0.02 fitted values Sample Quantiles

Estimated default probabilities method Pr{default Aaa} as % of TEXTBOOK est TEXTBOOK 0.005% 100% nonlinear 0.002% 40% TBS 0.0008% 16%

A Similar Problem: Challenger # o rings distressed 0.0 0.5 1.0 1.5 2.0 * ** * * * *** * ** *** ** * 55 60 65 70 75 80 temp

Challenger : Extrapolation to 31 o Logistic regression # o rings distressed 0 1 2 3 4 5 6 all data only data with failures * * * * * * ** 30 40 50 60 70 80 90 temp

Variance stabilizing transformation: how it works y 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Pop 2 transformed Pop 1 transformed log(x) Pop 1 Pop 2 0 5 10

Strength of Box-Cox family Take a < b Then h y (b, λ) h y (a, λ) = ( ) b λ 1 a which is increasing in λ and equals 1 when λ = 1 λ = 1 is the dividing point between concave and convex transformations h(y, λ) becomes a stronger concave transformation as λ decreases from 1 also, h(y, λ) becomes a stronger convex transformation as λ increases from 1

Strength of Box-Cox family, cont. Example: b/a = 2 Derivative ratio 0.5 1.0 1.5 2.0 2 α 1 concave convex 1.0 0.5 0.0 0.5 1.0 1.5 2.0 α

Maximum likelihood L(β, λ, σ) = n log(σ) + n [h(y i + κ, λ) h { f (X i, β) + κ, λ }] 2 i=1 n (λ 1) log(y i ) i=1 } {{ } from Jacobian 2σ 2 can maximize over σ analytically: σ 2 = n 1 n i=1 [h(y i + κ, λ) h { f (X i, β) + κ, λ }] 2 they maximize over (β, λ) with optim, for example κ is fixed in advance

Reference for TBS Transformation and Weighting in by Carroll and Ruppert (1988) Lots of examples But none in finance _

1-Year Treasury Constant Maturity Rate, daily data rate 5 10 15 1970 1980 1990 2000 year Source: Board of Governors of the Federal Reserve System http://research.stlouisfed.org/fred2/

R t versus year change in rate 1.0 0.5 0.0 0.5 1.0 1970 1980 1990 2000 year

R t versus R t 1 change in rate 1.0 0.5 0.0 0.5 1.0 5 10 15 lagged rate

R 2 t versus R t 1 change in rate 2 0.0 0.4 0.8 1.2 5 10 15 lagged rate

Drift function Discretized diffusion : R t = µ(r t 1 ) + σ(r t 1 )ɛ t µ(x) is the drift function σ(x) is the volatility function (as before)

Estimating Volatility Parametric : (Common in practice) Nonparametric : Var{( R t )} = β 0 R β 1 t 1 Var{( R t )} = σ 2 (R t 1 ) where σ( ) is a smooth function will be ed as a spline In these s: no dependence on t

Spline Software The penalized spline fits shown here were obtained using the function spm in R s SemiPar package author is Matt Wand

Comparing parametric and nonparametric volatility fits diff_rate^2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.00 0.02 0.04 0.06 0.08 nonpar par 5 10 15 lag_rate 5 10 15 lag_rate

Comparing parametric and nonparametric volatility fits: zooming in near 0 0.006 0.002 0.002 0.006 0 1 2 3 4 5 6 lag_rate nonpar par

Spline fitting Estimation of drift function diff_rate 1.0 0.0 1.0 5 10 15 0.06 0.00 5 10 15 lag_rate lag_rate 1st der 0.03 0.00 5 10 15 lag_rate 2nd der 0.010 0.005 5 10 15 lag_rate

Residuals for diffusion residual t := R t µ(r t 1 ) E(residual t ) = 0 std residual t := residual t σ(r t 1 ) E(std residual 2 t ) = 1

Question Are the drift and volatility functions constant in time?

Residual plots: ordinary residuals residual 0.010 0.000 0.010 1970 1980 1990 2000 year

Residual plots: standardized residuals 0 10 20 30 40 0.0 0.4 0.8 Lag ACF autocorrelation function 10 5 0 5 4 2 0 2 4 Normal Q Q Plot Sample Quantiles Theoretical Quantiles

Residual plots: Squared standardized residuals Squared Std residual 0.0 0.5 1.0 1.5 1970 1980 1990 2000 year

Residual plots: Squared standardized residuals autocorrelation function ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 Lag

GARCH(p, q) The GARCH(p, q) is a t = ɛ t σ t, where q p σ t = α0 + α i at i 2 + β i σt i 2. i=1 i=1 and ɛ t is an iid (strong) white noise process a t is weak white noise uncorrelated but with volatility clustering

GARCH(1,1) fit using garch in tseries Call: garch(x = std_drift_resid^2, order = c(1, 1)) Model: GARCH(1,1) Coefficient(s): Estimate Std. Error t value Pr(> t ) a0 0.27291 0.00148 184 <2e-16 *** a1 0.44690 0.00252 177 <2e-16 *** b1 0.80490 0.00075 1073 <2e-16 *** Box-Ljung test data: Squared.Residuals X-squared = 0.13, df = 1, p-value = 0.7186

GARCH: estimated conditional standard deviations GARCH conditional std 0.5 1.5 2.5 3.5 0 2000 4000 6000 8000 10000 year

GARCH: squared residuals with lowess smooth 0 2000 4000 6000 8000 10000 0 1 2 3 4 5 6 year GARCH squared residuals

GARCH residuals GARCH residuals ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 Lag

AR(1)/GARCH(1,1) Call: garchfit(formula = ~arma(1, 0) + garch(1, 1), data = std_drift_resid) Mean and Variance Equation: data ~ arma(1, 0) + garch(1, 1) [data = std_drift_resid] Conditional Distribution: norm Std. Errors: based on Hessian Error Analysis: Estimate Std. Error t value Pr(> t ) mu 0.001099 0.007476 0.147 0.883 ar1 0.138691 0.010468 13.248 < 2e-16 *** omega 0.008443 0.001163 7.257 3.96e-13 *** alpha1 0.073603 0.005483 13.424 < 2e-16 *** beta1 0.923098 0.005457 169.158 < 2e-16 ***

AR(1)/GARCH(1,1) residuals AR(1)/GARCH(1,1) residuals ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 Lag

AR(1)/GARCH(1,1) residuals - QQ plot 10 5 0 5 4 2 0 2 4 Normal Q Q Plot Sample Quantiles Theoretical Quantiles

AR(1)/GARCH(1,1) Call: garchfit(formula = ~arma(1, 0) + garch(1, 1), data = std_drift_resid, cond.dist = "std") Mean and Variance Equation: data ~ arma(1, 0) + garch(1, 1) [data = std_drift_resid] Conditional Distribution: std Error Analysis: Estimate Std. Error t value Pr(> t ) mu 0.0001087 0.0059401 0.018 0.98540 ar1 0.0969621 0.0090996 10.656 < 2e-16 *** omega 0.0016722 0.0005955 2.808 0.00498 ** alpha1 0.0664390 0.0065895 10.083 < 2e-16 *** beta1 0.9413495 0.0052248 180.169 < 2e-16 *** shape 3.9169920 0.1600835 24.468 < 2e-16 ***

AR(1)/GARCH(1,1) residuals - QQ plot QQ plot using t(3.91) Theoretical quantiles 10 5 0 5 10 10 5 0 5 Sample quantiles

Final for the interest rate s R t = µ(r t 1 ) + σ(r t 1 )a t 1 Model was fit in two steps: 1 estimate µ() and σ() spm in SemiPar 2 a t as AR(1)/GARCH(1,1) garchfit in fgarch 2 Could the two step be combined? 3 Would combining them change the results?

Reference for spline ing Semiparametric by Ruppert, Wand, and Carroll (2003) Lots of examples. But most from biostatistics and epidemiology

statistics analysis allows the use of prior information hierarchical priors can: specify knowledge that a group of parameters are similar to each other estimate their common distribution WinBUGS can be run from inside R using the R2WinBUGS package there is a similar BRugs package that runs OpenBugs BRugs is no longer on CRAN

midcapd.ts in fecofin package 500 daily on: 20 stocks market

Goal The goal is to use the first 100 days to estimate the mean for the next 400 days Four possible estimators: sample means Bayes estimation (shrinkage) mean of means (total shrinkage) CAPM ( return) = beta ( market return)

Who won? Estimate Sum of squared errors sample means 1.9 Bayes 0.17 mean of means 0.12 CAPM 1 0.66 CAPM 2 0.45 Squared estimation errors are summed over the 20 stocks CAPM 1: use mean of first 100 market CAPM 2: use mean of last 400 market

Why does shrinkage help? sample means Bayes mean 0.2 0.2 0.6 mean 0.2 0.2 0.6 1.0 1.4 1.8 estimate target 1.0 1.4 1.8 estimate target

Likelihood and prior r i,t = tth return on i stock Likelihood: IN = Independent Normal Hierarchical Prior: r i,t = µ i + ɛ i,t ɛ i,t IN (0, σ 2 ɛ ) µ i IN (α, σ 2 µ) Diffuse (non-informative) priors on α, σ 2 ɛ, σ 2 µ Auto and cross-sectional correlations are ignored (treated as 0)

-driven shrinkage Hierarchical Prior: µ i IN (α, σ 2 µ) the µ i are shrunk towards α α should be (approximately) the mean of the means σµ/σ 2 ɛ 2 controls the amount of shrinkage large σµ/σ 2 ɛ 2 less shrinkage data-driven shrinkage because σµ 2 and σɛ 2 are estimated

WinBUGS output > print(means.sim,digits=3) Inference for Bugs at "midcap.bug", fit using WinBUGS, 3 chains, each with 5100 iterations (first 100 discarded) n.sims = 15000 iterations saved mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff mu[1] 1.1e-01 0.169-0.22-1.0e-03 1.1e-01 2.1e-01 4.5e-01 1 4000 mu[2] 1.2e-01 0.170-0.20 1.5e-02 1.2e-01 2.3e-01 4.7e-01 1 6500 mu[3] 7.7e-02 0.168-0.27-2.7e-02 7.9e-02 1.8e-01 4.1e-01 1 3300 mu[4] 4.5e-02 0.176-0.33-6.1e-02 5.2e-02 1.6e-01 3.8e-01 1 1300 mu[18] 8.3e-02 0.170-0.27-2.4e-02 8.7e-02 1.9e-01 4.1e-01 1 3000 mu[19] 5.1e-02 0.171-0.32-5.1e-02 5.7e-02 1.6e-01 3.7e-01 1 1700 mu[20] 4.8e-02 0.175-0.33-5.8e-02 5.5e-02 1.6e-01 3.7e-01 1 1800 sigma_mu 1.5e-01 0.065 0.06 9.9e-02 1.3e-01 1.8e-01 3.1e-01 1 520 sigma_eps 4.3e+00 0.068 4.18 4.3e+00 4.3e+00 4.4e+00 4.4e+00 1 15000 alpha 8.8e-02 0.102-0.11 1.7e-02 8.8e-02 1.6e-01 2.8e-01 1 710 deviance 1.2e+04 3.989 11510.00 1.2e+04 1.2e+04 1.2e+04 1.2e+04 1 5300 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pd = Dbar-Dhat) pd = 4.1 and DIC = 11521.6 DIC is an estimate of predictive error (lower deviance is better).