New approach to interval linear regression



Similar documents
An interval linear programming contractor

9 Hedging the Risk of an Energy Futures Portfolio UNCORRECTED PROOFS. Carol Alexander 9.1 MAPPING PORTFOLIOS TO CONSTANT MATURITY FUTURES 12 T 1)

Fuzzy regression model with fuzzy input and output data for manpower forecasting

RUSRR048 COURSE CATALOG DETAIL REPORT Page 1 of 6 11/11/ :33:48. QMS 102 Course ID

Case Study. Implementing IAS 39 with Fairmat

Stock Market Forecasting Using Machine Learning Algorithms

JetBlue Airways Stock Price Analysis and Prediction

INCORPORATION OF LIQUIDITY RISKS INTO EQUITY PORTFOLIO RISK ESTIMATES. Dan dibartolomeo September 2010

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Numerical Analysis Lecture Notes

A linear algebraic method for pricing temporary life annuities

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

Classification Problems

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Producer Hedging the Production of Heavy Crude Oil

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Numerical Methods I Eigenvalue Problems

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Multiple Fuzzy Regression Model on Two Wheelers Mileage with Several independent Factors

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

QUALITY ENGINEERING PROGRAM

Zero: If P is a polynomial and if c is a number such that P (c) = 0 then c is a zero of P.

CURVE FITTING LEAST SQUARES APPROXIMATION

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Portfolio selection based on upper and lower exponential possibility distributions

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

Overview Accounting for Business (MCD1010) Introductory Mathematics for Business (MCD1550) Introductory Economics (MCD1690)...

Algebra 1 Course Title

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Predicting U.S. Industrial Production with Oil and Natural Gas Prices

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY

Ch.3 Demand Forecasting.

6 Hedging Using Futures

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Equity-index-linked swaps

BINOMIAL OPTION PRICING

A Log-Robust Optimization Approach to Portfolio Management

Using R for Linear Regression

Predict the Popularity of YouTube Videos Using Early View Data

Chapter 6: Multivariate Cointegration Analysis

Cross Validation. Dr. Thomas Jensen Expedia.com

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Fixed Point Theorems

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

An Introduction to Machine Learning

A Robust Formulation of the Uncertain Set Covering Problem

Lecture 4: Partitioned Matrices and Determinants

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Better Hedge Ratios for Spread Trading

Simple Linear Regression Inference

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Zeros of Polynomial Functions

Notes on Determinant

Math Course Descriptions & Student Learning Outcomes

Introduction to General and Generalized Linear Models

1 Portfolio mean and variance

Statistics 104: Section 6!

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Premaster Statistics Tutorial 4 Full solutions

Support Vector Machines Explained

Least Squares Estimation

Statistical Machine Learning

MATRIX TECHNICAL NOTES

Airport Planning and Design. Excel Solver

Proximal mapping via network optimization

Zeros of Polynomial Functions

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Introduction to Machine Learning Using Python. Vikram Kamath

Prediction Model for Crude Oil Price Using Artificial Neural Networks

PCHS ALGEBRA PLACEMENT TEST

Cyber-Security Analysis of State Estimators in Power Systems

Q = ak L + bk L. 2. The properties of a short-run cubic production function ( Q = AL + BL )

Mesh Discretization Error and Criteria for Accuracy of Finite Element Solutions

Machine Learning in Statistical Arbitrage

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Nonlinear Optimization: Algorithms 3: Interior-point methods

2.4 Multiproduct Monopoly. 2.4 Multiproduct Monopoly

>

A New Quantitative Behavioral Model for Financial Prediction

4.3 Lagrange Approximation

A Method for Solving Linear Programming Problems with Fuzzy Parameters Based on Multiobjective Linear Programming Technique

Deflection Calculation of RC Beams: Finite Element Software Versus Design Code Methods

Forecasting in supply chains

The program also provides supplemental modules on topics in geometry and probability and statistics.

Florida Math Correlation of the ALEKS course Florida Math 0028 to the Florida Mathematics Competencies - Upper

Inner Product Spaces

STATISTICAL DATA ANALYSIS COURSE VIA THE MATLAB WEB SERVER

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line

MSCA Introduction to Statistical Concepts

Corporate Defaults and Large Macroeconomic Shocks

Lecture 6: Logistic Regression

Numerical methods for American options

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Econometrics Simple Linear Regression

Transcription:

New approach to interval linear regression Milan Hladík 1 Michal Černý 2 1 Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic 2 Faculty of Computer Science and Statistics, University of Economics, Prague, Czech Republic MEC EurOPT 2010, İzmir, Turkey June 23 26 M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 1 / 13

Linear regression Model y j = x j,1 a 1 + x j,2 a 2 + + x j,n a n = X j a, j = 1,...,p, or, Xa, where X R p n is an input matrix, and y R p is an output vector. Problem Find a the best approximation to y = Xa. Methods L 2 -norm estimate (least squares method); L 1 -norm estimate (least abscissae method); L -norm estimate;... M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 2 / 13

Interval linear regression Model (crisp input crisp output) y j = x j,1 a 1 + x j,2 a 2 + + x j,n a n = X j a, j = 1,...,p, or, Xa, where a i = [a i c i,a i + c i ], i = 1,...,n, are intervals. Problem Find as narrow as possible interval vector a such that y Xa, i.e., j = 1,...,p a a : y j = X j a. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 3 / 13

Interval linear regression Applications Methods Ergonomics (Chang et al., 1996); Market sales forecasting (Heshmaty & Kandel, 1985); System identification (Kaneyoshi et al., 1990); Speech learning systems (Liu, 2009). (1) Linear programming formulation (Lee & Tanaka, 1999, Tanaka, 1987, Tanaka & Watada, 1988); (2) Quadratic programming formulation (Tanaka & Lee, 1998); (3) Support vector machines (Hao, 2009, Huang & Kao, 2009). Pros/Cons (1) simple, but some parameters crisp, (2)-(3) more complex, but small improvement. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 4 / 13

New approach Preliminary Find an interval vector in the form a = [a δc,a + δc ], where Theorem a is an initial real-valued estimate to y = Xa; c 0 is given (usually c = a or c = 1); δ 0 is a tolerance quotient in demand. If there is j {1,...,p} such that X j c = 0 and y j X j a then there exists no allowable δ. Otherwise let δ := y j X j a max, j: X j c >0 X j c where max = 0 by definition. Then δ is the minimal tolerance quotient. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 5 / 13

New approach Properties Very cheap to calculate δ. The widths of interval in a = [a δc,a + δc ] are proportional to c and minimal in some sense. Easy to interpret δ. δ can be used to measure a fitness of a to the model. But: δ highly depends on the initial estimate a. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 6 / 13

Examples: house price model Example (House price model, Lee & Tanaka, 1999) y = a 1 x 1 + a 2 x 2 + a 3 x 3 + a 4 x 4. j y x 1 x 2 x 3 x 4 1 606 1 1 38.09 36.43 2 710 1 1 62.10 26.50 3 808 1 1 63.76 44.71 4 826 1 1 74.52 38.09 5 865 1 1 75.38 41.10 6 852 1 2 52.99 26.49 7 917 1 2 62.93 26.49 8 1031 1 2 72.04 33.12 9 1092 1 2 76.12 43.06 10 1203 1 2 90.26 42.64 11 1394 1 3 85.70 31.33 12 1420 1 3 95.27 27.64 13 1601 1 3 105.98 27.64 14 1632 1 3 79.25 66.81 15 1699 1 3 120.5 32.25 x 1... absolute term, x 2... quality of material, x 3... the area of the first floor (m 2 ), x 4... the area of the second floor (m 2 ), y... the sale price (10,000 JPY). M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 7 / 13

Examples: house price model Example (cont.) Solution by a linear programming-based method (Tanaka & Lee, 1998) y = [0, 0] + [208, 283]x 2 + [5.85, 5.85]x 3 + [4.79, 4.79]x 4. The quadratic programming-based method (Tanaka & Lee, 1998): y = [ 8.81, 8.81] + [209, 259]x 2 + [6.14, 6.14]x 3 + [4.41, 5.39]x 4. Our method: a := (X T X) 1 X T y = ( 239.28,264.84,7.47,6.76) T, c := a, calculate δ = 0.0482. It means that all entries of a perturb within 4.82% tolerance: y = [ 251, 228] + [252, 278]x 2 + [7.11, 7.83]x 3 + [6.43, 7.08]x 4. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 8 / 13

Examples: outliers Example (Outliers, Ishibuchi & Tanaka, 1990) y 32 28 24 20 16 12 8 4 0 0 2 4 6 8 10 12 14 16 18 x Figure: Basic interval regression model without outliers. y 32 28 24 20 16 12 8 4 0 0 2 4 6 8 10 12 14 16 18 x Figure: Basic interval regression model with outliers. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 9 / 13

Examples: outliers Example (Outliers, Ishibuchi & Tanaka, 1990) y 32 28 24 20 16 12 8 4 0 0 2 4 6 8 10 12 14 16 18 x Figure: Improved interval regression model; a is obtained by least squares. y 32 28 24 20 16 12 8 4 0 0 2 4 6 8 10 12 14 16 18 x Figure: Improved interval regression model; a is obtained by L 1 regression. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 10 / 13

Examples: risk management Example (Price development of oil and kerosene) 250 kerosene crude oil 200 USD / Barrel 150 100 50 0 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 Figure: Evolution of prices of kerosene and crude oil (WTI) in dollars per barrel. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 11 / 13

Examples: risk management Example (cont.) An airline company may hedge expected future purchases of kerosene by crude oil futures. A proper hedge ratio must be determined to decrease the market risk. Assume that price of kerosene y is a linear function of the price of oil x: y = a 1 + a 2 x, A least square estimate of a is a = ( 1.319,1.225) T and the model reads y = 1.319 + 1.225x. Put c := a and calculate δ = 0.7204. Hence the hedge ratio a 2 is quite unstable; Removing 10 of the worst outliers we decrease δ to 0.5053; Removing 100 of the worst outliers we decrease δ to 0.1758. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 12 / 13

Conclusion and future work Conclusion The proposed method is more flexible; The widths of the resulting interval parameters are constructed proportionally; The tolerance quotient is easily interpreted by a user; It can used as a fitness measure of the model; Outliers are easy to detect and handle. Future work Theoretical properties of the quotient as a fitness measure. Extension of the method to crisp input interval output models. Extension of the method to interval input interval output models. M. Hladík and M. Černý (CUNI, VŠE) New approach to interval linear regression June 23 26, 2010 13 / 13