Linear methods for regression and classification with functional data



Similar documents
12/7/2011. Procedures to be Covered. Time Series Analysis Using Statgraphics Centurion. Time Series Analysis. Example #1 U.S.

Capacity Planning. Operations Planning

Spline. Computer Graphics. B-splines. B-Splines (for basis splines) Generating a curve. Basis Functions. Lecture 14 Curves and Surfaces II

Time Series. A thesis. Submitted to the. Edith Cowan University. Perth, Western Australia. David Sheung Chi Fung. In Fulfillment of the Requirements

Kalman filtering as a performance monitoring technique for a propensity scorecard

A Hybrid Method for Forecasting Stock Market Trend Using Soft-Thresholding De-noise Model and SVM

An Anti-spam Filter Combination Framework for Text-and-Image s through Incremental Learning

How To Calculate Backup From A Backup From An Oal To A Daa

Lecture 40 Induction. Review Inductors Self-induction RL circuits Energy stored in a Magnetic Field

A Common Neural Network Model for Unsupervised Exploratory Data Analysis and Independent Component Analysis

MORE ON TVM, "SIX FUNCTIONS OF A DOLLAR", FINANCIAL MECHANICS. Copyright 2004, S. Malpezzi

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

A GENERALIZED FRAMEWORK FOR CREDIT RISK PORTFOLIO MODELS

APPLICATION OF CHAOS THEORY TO ANALYSIS OF COMPUTER NETWORK TRAFFIC Liudvikas Kaklauskas, Leonidas Sakalauskas

Linear Extension Cube Attack on Stream Ciphers Abstract: Keywords: 1. Introduction

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS. Exponential Smoothing for Inventory Control: Means and Variances of Lead-Time Demand

Levy-Grant-Schemes in Vocational Education

Estimating intrinsic currency values

Anomaly Detection in Network Traffic Using Selected Methods of Time Series Analysis

RESOLUTION OF THE LINEAR FRACTIONAL GOAL PROGRAMMING PROBLEM

Pedro M. Castro Iiro Harjunkoski Ignacio E. Grossmann. Lisbon, Portugal Ladenburg, Germany Pittsburgh, USA

DESIGN OF OPTIMAL BONUS-MALUS SYSTEMS WITH A FREQUENCY AND A SEVERITY COMPONENT ON AN INDIVIDUAL BASIS IN AUTOMOBILE INSURANCE ABSTRACT KEYWORDS

HEURISTIC ALGORITHM FOR SINGLE RESOURCE CONSTRAINED PROJECT SCHEDULING PROBLEM BASED ON THE DYNAMIC PROGRAMMING

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting*

Information-based trading, price impact of trades, and trade autocorrelation

A Model for Time Series Analysis

An Ensemble Data Mining and FLANN Combining Short-term Load Forecasting System for Abnormal Days

A Background Layer Model for Object Tracking through Occlusion

A Hybrid AANN-KPCA Approach to Sensor Data Validation

Efficiency of General Insurance in Malaysia Using Stochastic Frontier Analysis (SFA)

Fixed Income Attribution. Remco van Eeuwijk, Managing Director Wilshire Associates Incorporated 15 February 2006

Modèles financiers en temps continu

Testing techniques and forecasting ability of FX Options Implied Risk Neutral Densities. Oren Tapiero

A binary powering Schur algorithm for computing primary matrix roots

INTERNATIONAL JOURNAL OF STRATEGIC MANAGEMENT

The Feedback from Stock Prices to Credit Spreads

Methodology of the CBOE S&P 500 PutWrite Index (PUT SM ) (with supplemental information regarding the CBOE S&P 500 PutWrite T-W Index (PWT SM ))

Performance Measurement for Traditional Investment

Modeling state-related fmri activity using change-point theory

PerfCenter: A Methodology and Tool for Performance Analysis of Application Hosting Centers

IMPROVING THE RESISTANCE OF A SERIES 60 VESSEL WITH A CFD CODE

Bayesian Forecasting of Stock Prices Via the Ohlson Model

THE USE IN BANKS OF VALUE AT RISK METHOD IN MARKET RISK MANAGEMENT. Ioan TRENCA *

Genetic Algorithm with Range Selection Mechanism for Dynamic Multiservice Load Balancing in Cloud-Based Multimedia System

GUIDANCE STATEMENT ON CALCULATION METHODOLOGY

Y2K* Stephanie Schmitt-Grohé. Rutgers Uni ersity, 75 Hamilton Street, New Brunswick, New Jersey

A Modification of the HP Filter. Aiming at Reducing the End-Point Bias

Attribution Strategies and Return on Keyword Investment in Paid Search Advertising

Evaluation of the Stochastic Modelling on Options

An Architecture to Support Distributed Data Mining Services in E-Commerce Environments

Cooperative Distributed Scheduling for Storage Devices in Microgrids using Dynamic KKT Multipliers and Consensus Networks

Combining Mean Reversion and Momentum Trading Strategies in. Foreign Exchange Markets

Applying the Theta Model to Short-Term Forecasts in Monthly Time Series

THE IMPACT OF UNSECURED DEBT ON FINANCIAL DISTRESS AMONG BRITISH HOUSEHOLDS. Ana del Río and Garry Young. Documentos de Trabajo N.

Fundamental Analysis of Receivables and Bad Debt Reserves

The Prediction Algorithm Based on Fuzzy Logic Using Time Series Data Mining Method

DOCUMENTOS DE ECONOMIA Y FINANZAS INTERNACIONALES

This research paper analyzes the impact of information technology (IT) in a healthcare

The Joint Cross Section of Stocks and Options *

Pavel V. Shevchenko Quantitative Risk Management. CSIRO Mathematical & Information Sciences. Bridging to Finance

A Heuristic Solution Method to a Stochastic Vehicle Routing Problem

Supplemental Online Appendix

A New Approach to Linear Filtering and Prediction Problems 1

TECNICHE DI DIAGNOSI AUTOMATICA DEI GUASTI. Silvio Simani References

CLoud computing has recently emerged as a new

JCER DISCUSSION PAPER

What influences the growth of household debt?

CHAPTER 10 DUMMY VARIABLE REGRESSION MODELS

MODEL-BASED APPROACH TO CHARACTERIZATION OF DIFFUSION PROCESSES VIA DISTRIBUTED CONTROL OF ACTUATED SENSOR NETWORKS

Cost- and Energy-Aware Load Distribution Across Data Centers

Journal of Econometrics

The Rules of the Settlement Guarantee Fund. 1. These Rules, hereinafter referred to as "the Rules", define the procedures for the formation

Searching for a Common Factor. in Public and Private Real Estate Returns

Pricing Rainbow Options

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

L10: Linear discriminants analysis

Currency Exchange Rate Forecasting from News Headlines

Prices of Credit Default Swaps and the Term Structure of Credit Risk

The performance of imbalance-based trading strategy on tender offer announcement day

Sensor Nework proposeations

Are Economics-Based and Psychology-Based Measures of Ability the Same?

Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds.

Calculating and interpreting multipliers in the presence of non-stationary time series: The case of U.S. federal infrastructure spending

Selected Financial Formulae. Basic Time Value Formulae PV A FV A. FV Ad

Oblique incidence: Interface between dielectric media

THE IMPACT OF QUICK RESPONSE IN INVENTORY-BASED COMPETITION

IMES DISCUSSION PAPER SERIES

PARTICLE FILTER BASED VEHICLE TRACKING APPROACH WITH IMPROVED RESAMPLING STAGE

What Explains Superior Retail Performance?

Transcription:

Lnear mehods for regresson and classfcaon wh funconal daa Glber Sapora Chare de Sasue Appluée & CEDRIC Conservaore Naonal des Ars e Méers 9 rue San Marn, case 44 754 Pars cedex 3, France sapora@cnam.fr G. Damana Cosanzo Dparmeno d Economa e Sasca Unversà della Calabra Va P. Bucc, Cubo C 8736 Arcavacaa d Rende (CS) - Ialy dm.cosanzo@uncal. Crsan Preda Déparemen de Sasue-CERIM, Faculé de Médecne, Unversé de Llle,, Place de Verdun, 5945 Llle Cedex, France crsan.preda@unv-llle.fr Funconal daa occurs when we observe curves or pahs from a sochasc process X. If for each curve or pah we have a sngle response varable Y, we have a regresson problem when Y s numercal, a classfcaon problem when Y s caegorcal. We assume here ha all rajecores are observed connuously on a me nerval [;] and ha he varables Y (when numercal) and X have zero mean.. Regresson wh a funconal predcor he funconal lnear model consders a predcor whch may be expressed as an negral sum: ˆ Y = X β () d he problem s no new and comes back o Fsher (94) who used he expresson negral regresson. I s well known ha hs regresson model yelds o an ll-posed problem: he leas suares creron leads o he Wener-Hopf euaon whch n general has no an unue soluon. E( XY ) = E( X ) ( ) Xs β sds and he problem s even worse when we ry o esmae he regresson coeffcen funcon β () wh a fne number of observaons. Snce he works of Ramsay & Slverman (997), many echnues have been appled o solve hese knd of problem, mosly by usng explc regularzaon echnues. Hgh dmensonaly and mulcollneary also nvolves some smoohng. In he funconal lnear approach, funconal daa (he predcor) and funconal parameer can be modelled as lnear combnaons of a bass funcons from a gven funconal famly. Leraure on ha subjec essenally dffers n he choce of he bass and he way parameers are esmaed. Bass funcons should be chosen o reflec he characerscs of he daa: for example, Fourer bass are usually used o model perodc daa, whle B-splne bass funcons are chosen as hey have he advanage of fne suppor. We wll focus here on lnear mehods based on an orhogonal decomposon of he predcors.

. Lnear regresson on prncpal componens (Preda & Sapora, 5a) he use of componens derved from he Karhunen-Loeve expanson s, for funconal daa, he euvalen of prncpal componens regresson (PCR). he prncpal componen analyss (PCA) of he sochasc process (X ) consss n represenng X as: X = f() ξ = where he prncpal componens ξ = f () X d are obaned hrough he egenfuncons of he covarance operaor: Csf (, ) ( ) () sds= λ f. In pracce we need o choose an approxmaon of order : ˆ cov( Y; ξ ) Y = ξ. = λ Bu he use of prncpal componens for predcon s heursc because hey are compued ndependenly of he response: he componens correspondng o he larges egenvalues are no necessarly he mos predcve, bu s dffcul o rank an nfne number of componens accordng o R.... Funconal PLS regresson PLS regresson offers a good alernave o he PCR mehod by replacng he leas suares creron wh ha of maxmal covarance beween (X ) and Y. max wcov ( Y, w( ) X ) d wh w = he frs PLS componen s gven by = w() X. d he PLS regresson s erave and furher PLS componens are obaned by maxmzng he covarance creron beween he resduals of boh Y and (X ) wh he prevous componens. he PLS approxmaon s gven by: Yˆ ˆ PLS ( ) = c +... + c = β ( )( ) d PLS X and for funconal daa he same propery han n fne dmenson holds: PLS fs closer han PCR R ( Y; Yˆ ˆ PLS ( ) ) R ( Y; YPCR ( ) ) snce PCR componens are obaned rrespecve of he response. In Preda & Sapora () we show he convergence of he PLS approxmaon o he approxmaon gven by he classcal lnear regresson: lm ˆ ˆ E( YPLS( ) Y ) = In pracce, he number of PLS componens used for regresson s deermned by crossvaldaon.. Cluserwse PLS regresson Cluserwse regresson may be used when heerogeney n he daa s presen. hs corresponds o a mxure of several regresson models, ha s, here exss laen caegorcal varable G wh k caegores defnng he clusers such ha: E( Y / X =, G = g) = α + X β ( ) d VY ( / X=, G= g) = σ g g

k s supposed o be known, bu no he clusers. Le us remnd of he classcal case for a fne number of predcors : for n observaons, he cluser lnear algorhm fnds an opmal paron of he n pons, and he regresson models for each cluser (elemen of paron) whch mnmze he creron: ' ( y ( ˆ )) αg + βgx g he mnmzaon s acheved by an alernaed leas suares algorhm of he k-means famly alernang an OLS for each group (supposed known) and an allocaon of each un o he closes regresson surface e he model where he resdual s mnmal. Under he hypohess ha resduals whn each cluser are ndependen and normally dsrbued, hs creron s euvalen o maxmzaon of he lkelhood funcon (Henng, ). For funconal regresson he prevous model s no adeuae and we have proposed o esmae he local models n each cluser by PLS regresson n order o overcome hs problem. he convergence of hs algorhm has been dscussed n (Preda & Sapora, 5b) and cluserwse PLS funconal regresson has been appled o predc he behavor of shares of he Pars sock marke on a ceran lapse of me. 3. Bnary classfcaon wh a funconal predcor 3. Fsher s lnear dscrmnan analyss Prevous mehods are easly generalzed o bnary classfcaon, snce Fsher s lnear dscrmnan funcon s euvalen o a mulple regresson where he response varable Y s coded wh values a and b : mos freuenly ±, bu also convenenly p p and - wh p p (p, p ) he probably dsrbuon of Y. Cosanzo D. e al. (6) and Preda C. e al. (7) have appled PLS funconal classfcaon o predc he ualy of cookes from curves represenng he ressance (densy) of dough observed durng he kneadng process. For a gven flour, he ressance of dough s recorded durng he frs 48 s of he kneadng process. We have 5 curves whch can be consdered as sample pahs of a L -connuous sochasc process. Each curve s observed n 4 euspaced me pons of he nerval me [, 48]. Afer kneadng, he dough s processed o oban cookes. For each flour we have he ualy Y of cookes whch can be Good, Adjusable and Bad. Our sample conans 5 observaons for Y = Good, 5 for Y = Adjusable and 4 for Y = Bad. Due o measurng errors, each curve s smoohed usng cubc B-splne funcons wh 6 knos. Fgure : Smoohed kneadng curves Fgure : Dscrmnan coeffcen funcon 3

Some of hese flours could be adjused o become Good. herefore, we have consdered he se of Adjusable flours as he es sample and predc for each one he group membershp, Y = {Good, Bad}, usng he dscrmnan coeffcen funcon (Fg. ) gven by he PLS approach on he 9 flours. PLS funconal dscrmnan analyss gave an average error rae of % whch s beer han dscrmnaon based on prncpal componens. 3. Funconal logsc regresson Le Y be a bnary random varable and y,, y n he correspondng random sample assocaed o he sample pahs x (), =,, n. A naural exenson of he logsc regresson (Ramsay e al., 997) s o defne he funconal logsc regresson model by : π ln = α + x () ()d;,, β = π n where π = P( Y = X = x ( ); ). I may be assumed (Ramsay e al., 997) ha he parameer funcon and he sample pahs () are n he same fne space: x p β ( ) = b ψ ( ) = b ψ = p x ( ) = cψ ( ) = c ψ = where ψ ( ),, ψ ( ) are he elemens of a bass of he fne dmensonal space. Such an approxmaon ransform he funconal model () n a smlar form o sandard mulple logsc regresson model whose desgn marx s he marx whch conans he coeffcens of he expanson of sample pahs n erms of he bass, C = ( c ), mulpled by he marx Φ = ( φk = ψ k ( ) ψ ( )d), whose elemens are he nner produc of he bass funcons π ln = α + C Φ b π wh b = ( b,, b p ), π = ( π, π p ) and beng he p-dmensonal uny vecor. Fnally, n order o esmae he parameers a furher approxmaon by runcang he bass expanson could be consdered. Alernavely, regularzaon or smoohng may be ge by some roughness penales approach. In a smlar way as we defned earler funconal PCR, Leng and Müller (6) use funconal logsc regresson based on funconal prncpal componens wh he am of classfyng gene expresson curves no known gene groups.wh he explc am o avod mulcollnary and reduce dmensonaly, Escabas e al. (4) and Agulera e al. (6) propose an esmaon procedure of funconal logsc regresson, based on akng as covaraes a reduced se of funconal prncpal componens of he predcor sample curves, whose approxmaon s ge n a fne space of no necessarly orhonormal funcons. wo dfferen forms of funconal prncpal componens analyss are hen consdered, and wo 4

dfferen creron for ncludng he covaraes n he model are also consdered. Müller and Sadmüller (5) consder a funconal uas lkelhood and an approxmaon of he predcor process wh a runcaed Karhunen-Loeve expanson. he laer also developed asympoc dsrbuon heory usng funconal prncpal scores. Comparsons wh funconal LDA are n progress, bu s lkely ha he dfferences wll be small. 3.3 Ancpaed predcon In many real me applcaons lke ndusral process, s of he hghes neres o make ancpaed predcons. Le denoe d he approxmaon for a dscrmnan score consdered on he nerval me [, ], wh <. For funconal PLS or logsc regresson he score s d = X ˆ( β ) d bu any mehod leadng o an esmaon of he poseror probably of belongng o one group gves a score. he objecve here s o fnd * < such ha he dscrmnan funcon d* performs ue as well as d. For a bnary arge Y, he ROC curve and he AUC (Area Under Curve) are generally acceped as effcen measures of he dscrmnang power of a dscrmnan score. Le d (x) be he score value for some un x. Gven a hreshold r, x s classfed no Y = f d (x) > r. he rue posve rae or sensvy s P(d > r Y = ) and he false posve rae or specfcy, P(d > r Y = ). he ROC curve gves he rue posve rae as a funcon of he false posve rae and s nvaran under any monoonc ncreasng ransformaon of he score. In he case of an neffcen score, boh condonal dsrbuons of d gven Y = and Y= are dencal and he ROC curve s he dagonal lne. In case of perfec dscrmnaon, he ROC curve s confounded wh he edges of he un suare. he Area Under ROC Curve, s hen a global measure of dscrmnaon. I can be easly proved ha AUC()= P(X > X ), where X s a random varable dsrbued as d wheny= and X s ndependenly dsrbued as d for Y =. akng all pars of observaons, one n each group, AUC() s hus esmaed by he percenage of concordan pars (Wlcoxon- Mann-Whney sasc). A soluon s o defne * as he frs value of s where AUC(s) s no sgnfcanly dfferen from AUC() Snce AUC(s) and AUC() are wo dependen random varables, we use a boosrap es for comparng areas under ROC curves: we resample M mes he daa, accordng o a srafed scheme n order o keep nvaran he number of observaons of each group. Le AUC m (s) and AUC m () be he resampled values of AUC for m = o M, and δ m her dfference. esng f AUC(s) = AUC() s performed by usng a pared -es, or a Wlcoxon pared es, on he M values δ m. he prevous mehodology has been appled o he kneadng daa: he sample of 9 flours s randomly dvded no a learnng sample of sze 6 and a es sample of sze 3. In he es sample he wo classes have he same number of observaons. he funconal PLS dscrmnan analyss gves, wh he whole nerval [, 48], an average of he es error rae of abou., for an average AUC() =.746. he ancpaed predcon procedure gves for M = 5 and sample sze es n = 3 (same number of observaon n each class), * = 86. hus, one can reduce he recordng perod of he ressance of dough o less han half of he curren one. 5

4. Concluson and perspecves In hs paper we addressed he problem of predcng a caegorcal or numercal varable Y wh an nfne se of predcors X. We advocaed lnear models whch are easy o use and nerpre; mulcollneary beween predcors s bes solved by PLS han by PCR. A cluserwse generalzaon s a way o ake no accoun laen heerogeney as well as some knd of non lneary. For bnary classfcaon we proposed an ancpaed predcon echnue based on boosrap comparsons of ROC curves. Works n progress comprses he exenson of cluserwse funconal regresson o bnary classfcaon, comparson wh funconal logsc regresson as well as on-lne forecasng: nsead of usng he same ancpaed decson me * for all daa, we wll ry o adap * o each new rajecory gven s ncomng measuremens. References Agulera A.M., Escabas, M. & Valderrama M.J. (6) Usng prncpal componens for esmang logsc regresson wh hgh-dmensonal mulcollnear daa, Compuaonal Sascs & Daa Analyss, 5, 95-94 Cosanzo D., Preda C. & Sapora G. (6) Ancpaed predcon n dscrmnan analyss on funconal daa for bnary response. In COMPSA6, 8-88, Physca-Verlag Escabas, M., Agulera A.M. & Valderrama M.J. (4) Prncpal Componen Esmaon of Funconal Logsc Regresson: dscusson of wo dfferen approaches. Nonparamerc Sascs 6, 365-384. Fsher R.A. (94) he Influence of Ranfall on he Yeld of Whea a Rohamsed. Phlosophcal ransacons of he Royal Socey, B: 3: 89-4 Henng, C., (). Idenfably of models for cluserwse lnear regresson. J. Classfcaon 7, 73 96. Leng X. & Müller, H.G. (6) Classfcaon usng funconal daa analyss for emporal gene expresson daa. Bonformacs, 68-76. Müller, H.G. & Sadmüller, U. (5) Generalzed funconal lnear models. he Annals of Sascs 33, 774-85. Preda C. & Sapora G. (5a) PLS regresson on a sochasc process. Compuaonal Sascs and Daa Analyss, 48, 49-58. Preda C. & Sapora G. (5b) Cluserwse PLS regresson on a sochasc process. Compuaonal Sascs and Daa Analyss, 49, 99-8 Preda C., Sapora G. & Lévéder C., (7) PLS classfcaon of funconal daa, Compuaonal Sascs Ramsay & Slverman (997) Funconal daa analyss, Sprnger 6