Currency Exchange Rate Forecasting from News Headlines



Similar documents
Methodology of the CBOE S&P 500 PutWrite Index (PUT SM ) (with supplemental information regarding the CBOE S&P 500 PutWrite T-W Index (PWT SM ))

12/7/2011. Procedures to be Covered. Time Series Analysis Using Statgraphics Centurion. Time Series Analysis. Example #1 U.S.

A Hybrid Method for Forecasting Stock Market Trend Using Soft-Thresholding De-noise Model and SVM

How To Calculate Backup From A Backup From An Oal To A Daa

GUIDANCE STATEMENT ON CALCULATION METHODOLOGY

Spline. Computer Graphics. B-splines. B-Splines (for basis splines) Generating a curve. Basis Functions. Lecture 14 Curves and Surfaces II

Anomaly Detection in Network Traffic Using Selected Methods of Time Series Analysis

MORE ON TVM, "SIX FUNCTIONS OF A DOLLAR", FINANCIAL MECHANICS. Copyright 2004, S. Malpezzi

Estimating intrinsic currency values

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS. Exponential Smoothing for Inventory Control: Means and Variances of Lead-Time Demand

Ground rules. Guide to the calculation methods of the FTSE Actuaries UK Gilts Index Series v1.9

Boosting for Learning Multiple Classes with Imbalanced Class Distribution

Kalman filtering as a performance monitoring technique for a propensity scorecard

An Ensemble Data Mining and FLANN Combining Short-term Load Forecasting System for Abnormal Days

An Architecture to Support Distributed Data Mining Services in E-Commerce Environments

Using Cellular Automata for Improving KNN Based Spam Filtering

An Anti-spam Filter Combination Framework for Text-and-Image s through Incremental Learning

The Rules of the Settlement Guarantee Fund. 1. These Rules, hereinafter referred to as "the Rules", define the procedures for the formation

INTERNATIONAL JOURNAL OF STRATEGIC MANAGEMENT

THE USE IN BANKS OF VALUE AT RISK METHOD IN MARKET RISK MANAGEMENT. Ioan TRENCA *

The Joint Cross Section of Stocks and Options *

The performance of imbalance-based trading strategy on tender offer announcement day

Both human traders and algorithmic

Expiration-day effects, settlement mechanism, and market structure: an empirical examination of Taiwan futures exchange

Guidelines and Specification for the Construction and Maintenance of the. NASDAQ OMX Credit SEK Indexes

Genetic Algorithm with Range Selection Mechanism for Dynamic Multiservice Load Balancing in Cloud-Based Multimedia System

Linear Extension Cube Attack on Stream Ciphers Abstract: Keywords: 1. Introduction

JCER DISCUSSION PAPER

Y2K* Stephanie Schmitt-Grohé. Rutgers Uni ersity, 75 Hamilton Street, New Brunswick, New Jersey

Capacity Planning. Operations Planning

Information-based trading, price impact of trades, and trade autocorrelation

The Virtual Machine Resource Allocation based on Service Features in Cloud Computing Environment

Network Effects on Standard Software Markets: A Simulation Model to examine Pricing Strategies

Lecture 40 Induction. Review Inductors Self-induction RL circuits Energy stored in a Magnetic Field

PerfCenter: A Methodology and Tool for Performance Analysis of Application Hosting Centers

Nonlinearity or Structural Break? - Data Mining in Evolving Financial Data Sets from a Bayesian Model Combination Perspective

Forecasting Stock Prices using Sentiment Information in Annual Reports A Neural Network and Support Vector Regression Approach

Scientific Ontology Construction Based on Interval Valued Fuzzy Theory under Web 2.0

Index Mathematics Methodology

APPLICATION OF CHAOS THEORY TO ANALYSIS OF COMPUTER NETWORK TRAFFIC Liudvikas Kaklauskas, Leonidas Sakalauskas

The Cause of Short-Term Momentum Strategies in Stock Market: Evidence from Taiwan

A Real-time Adaptive Traffic Monitoring Approach for Multimedia Content Delivery in Wireless Environment *

Fixed Income Attribution. Remco van Eeuwijk, Managing Director Wilshire Associates Incorporated 15 February 2006

The Performance of Seasoned Equity Issues in a Risk- Adjusted Environment?

Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds.

Efficiency of General Insurance in Malaysia Using Stochastic Frontier Analysis (SFA)

MODEL-BASED APPROACH TO CHARACTERIZATION OF DIFFUSION PROCESSES VIA DISTRIBUTED CONTROL OF ACTUATED SENSOR NETWORKS

SPC-based Inventory Control Policy to Improve Supply Chain Dynamics

Return Persistence, Risk Dynamics and Momentum Exposures of Equity and Bond Mutual Funds

Australian dollar and Yen carry trade regimes and their determinants

THE IMPACT OF UNSECURED DEBT ON FINANCIAL DISTRESS AMONG BRITISH HOUSEHOLDS. Ana del Río and Garry Young. Documentos de Trabajo N.

Analyzing Energy Use with Decomposition Methods

A Hybrid AANN-KPCA Approach to Sensor Data Validation

Performance Measurement for Traditional Investment

The Feedback from Stock Prices to Credit Spreads

MULTI-WORKDAY ERGONOMIC WORKFORCE SCHEDULING WITH DAYS OFF

Insurance. By Mark Dorfman, Alexander Kling, and Jochen Russ. Abstract

No David Büttner and Bernd Hayo. Determinants of European Stock Market Integration

Linear methods for regression and classification with functional data

Attribution Strategies and Return on Keyword Investment in Paid Search Advertising

Case Study on Web Service Composition Based on Multi-Agent System

CONTROLLER PERFORMANCE MONITORING AND DIAGNOSIS. INDUSTRIAL PERSPECTIVE

Testing techniques and forecasting ability of FX Options Implied Risk Neutral Densities. Oren Tapiero

The Definition and Measurement of Productivity* Mark Rogers

What Explains Superior Retail Performance?

Fundamental Analysis of Receivables and Bad Debt Reserves

Pedro M. Castro Iiro Harjunkoski Ignacio E. Grossmann. Lisbon, Portugal Ladenburg, Germany Pittsburgh, USA

A 3D Model Retrieval System Using The Derivative Elevation And 3D-ART

This research paper analyzes the impact of information technology (IT) in a healthcare

Time Series. A thesis. Submitted to the. Edith Cowan University. Perth, Western Australia. David Sheung Chi Fung. In Fulfillment of the Requirements

A GENERALIZED FRAMEWORK FOR CREDIT RISK PORTFOLIO MODELS

What influences the growth of household debt?

Long Run Underperformance of Seasoned Equity Offerings: Fact or an Illusion?

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting*

Levy-Grant-Schemes in Vocational Education

Who are the sentiment traders? Evidence from the cross-section of stock returns and demand. April 26, Luke DeVault. Richard Sias.

A STUDY ON THE CAUSAL RELATIONSHIP BETWEEN RELATIVE EQUITY PERFORMANCE AND THE EXCHANGE RATE

Ground rules. FTSE Global Bonds Index Series v1.7

A Heuristic Solution Method to a Stochastic Vehicle Routing Problem

The US Dollar Index Futures Contract

Cooperative Distributed Scheduling for Storage Devices in Microgrids using Dynamic KKT Multipliers and Consensus Networks

Cost- and Energy-Aware Load Distribution Across Data Centers

An Introductory Study on Time Series Modeling and Forecasting

Diversification in Banking Is Noninterest Income the Answer?

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C.

Applying the Theta Model to Short-Term Forecasts in Monthly Time Series

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

Chapter 8: Regression with Lagged Explanatory Variables

Combining Mean Reversion and Momentum Trading Strategies in. Foreign Exchange Markets

The Cost of Equity in Canada: An International Comparison

IMES DISCUSSION PAPER SERIES

Distribution Channel Strategy and Efficiency Performance of the Life insurance. Industry in Taiwan. Abstract

Scaling Up POMDPs for Dialog Management: The Summary POMDP Method. Jason D. Williams and Steve Young

The impact of unsecured debt on financial distress among British households

Preface. Frederick D. Wolf Director, Accounting and Financial Management Division

Systematic risk measurement in the global banking stock market with time series analysis and CoVaR

YÖNET M VE EKONOM Y l:2005 Cilt:12 Say :1 Celal Bayar Üniversitesi..B.F. MAN SA

FOREIGN AID AND ECONOMIC GROWTH: NEW EVIDENCE FROM PANEL COINTEGRATION

RESOLUTION OF THE LINEAR FRACTIONAL GOAL PROGRAMMING PROBLEM

Working Paper Tracking the new economy: Using growth theory to detect changes in trend productivity

Transcription:

Currency Exchange Rae Forecasng from News Headlnes Desh Peramunelleke Raymond K. Wong School of Compuer Scence & Engneerng Unversy of New Souh Wales Sydney, NSW 2052, Ausrala deshp@cse.unsw.edu.au wong@cse.unsw.edu.au Absrac We nvesgae how money marke news headlnes can be used o forecas nraday currency exchange rae movemens. The nnovaon of he approach s ha, unlke analyss based on quanfable nformaon, he forecass are produced from ex descrbng he curren saus of world fnancal markes, as well as polcal and general economc news. In conras o numerc me seres daa exual daa conans no only he effec (e.g., he dollar rses agans he Deuschmark) bu also he possble causes of he even (e.g., because of a weak German bond marke). Hence mproved predcons are expeced from hs rcher npu. The oupu s a caegorcal forecas abou currency exchange raes he dollar moves up, remans seady or goes down whn he nex one, wo or hree hours respecvely. On a publcly avalable commercal daa se he sysem produces resuls ha are sgnfcanly beer han random predcon. The conrbuon of hs research s he smar modelng of he predcon problem enablng he use of conen rch ex for forecasng purposes. Keywords Daa mnng, foregn exchange, predcon 1 1 Inroducon The foregn exchange marke has changed dramacally over he pas weny fve years. The amouns raded are now huge wh over a rllon US dollars n ransacons execued each day n he foregn exchange marke alone. In hs ncreasngly challengng and compeve marke, nvesors and raders need ools o selec and analyze he rgh daa from he vas amouns of daa avalable o hem o help hem make good decsons. Ths paper specfcally descrbes an approach o forecas shor-erm movemens n he foregn exchange (FX) markes from real-me news headlnes and quoed exchange raes based on hybrd daa mnng echnques. The basc dea s o auomae human hnkng and reasonng. Traders, speculaors and prvae ndvduals ancpae he drecon of fnancal marke movemens before makng an nvesmen decson. To reach a decson, any nvesor wll carefully read he mos recen economc and fnancal news, sudy repors wren by marke analyss and marke sraegss, and carefully wegh opnons expressed n varous fnancal journals Copyrgh 2001, Ausralan Compuer Socey, Inc. Ths paper appeared a he Threenh Ausralasan Daabase Conference (ADC2002), Melbourne, Ausrala. Conferences n Research and Pracce n Informaon Technology, Vol. 5. Xaofang Zhou, Ed. Reproducon for academc, no-for prof purposes permed provded hs ex s ncluded. and news sources. Ths gves a pcure of he curren suaon. Then knowng how markes behaved n he pas n dfferen suaons, people wll mplcly mach he curren suaon wh hose suaons n he pas ha are mos smlar o he curren one. The expecaon s hen ha he marke now wll behave as dd n he pas when crcumsances were smlar. Our approach s auomang hs process. The news headlnes, whch are aken as npu, conan a summary of he mos mporan news ems. News headlnes use a resrced vocabulary, conanng only relevan nformaon (no spors news for nsance) and are wren by professonals followng src wrng gudelnes. Ths makes hese news headlnes perfec canddaes for auomaed analyss. Furhermore, hese news headlnes are receved real-me n all he radng rooms around he world. Hence he raders who are acually movng he markes base her expecaons precsely on hose news headlnes. The curren suaon s hen expressed n erms of couns of hese keyword records. The curren suaon s mached wh prevous suaons and her correlaon s deermned. Ths research elaboraes and valdaes hs predcon approach. We show how exual npu can be used o forecas nraday currency exchange rae movemens. In conras o numerc me seres daa exual daa conans no only he effec (e.g., he dollar rses agans he Deuschmark) bu also he possble causes of he even (e.g., because of a weak German bond marke). The oupu s a caegorcal forecas abou currency exchange raes he dollar moves up, remans seady or goes down whn he nex one, wo or hree hours. Much promsng research o predc FX movemens has already been done. I s well known ha purchasng power pary [27] and rade balance [11] are wo fundamenal facors nfluencng he long-erm movemens of exchange raes. For shor-erm FX predcon, however, he forecasng mehods used so far, be hey echncal analyss [25], sascs or neural nes [12,17], base her predcons on quanfable nformaon [2,5,6,9,10,13,14,23,24]. As npu hey usually ake huge amouns of quoed exchange raes beween varous currences. The nnovaon of our approach s ha we make use of non-numerc and hard o quanfy daa derved from exual nformaon. In conras o me seres daa [32] conanng he effec only (e.g., he dollar rses agans he Deuschmark) exual nformaon also conans he possble causes of he even (e.g., because of a weak German bond marke) [7]. Hence mproved predcons are expeced from hs more powerful npu.

Goodhar nally aemped o quanfy exual news by lookng a full news pages of Reuers [8]. Bu he dd no ake our approach of lookng a poenally marke movng word pars, records and quadruples. The sudy [36] descrbes research nvolvng manual processng of news o enhance he knowledge base of foregn exchange rade suppor sysems. The res of he paper s srucured as follows. Secon 2 conans he echncal descrpon of our FX forecasng echnques. One of he major ssues nvesgaed s how o preprocess daa so as o make hem amenable o classfcaon echnques. Secon 3 descrbes he expermens conduced usng he daa se HFDF93 whch can be purchased on-lne (va www.olsen.ch), he resuls acheved and a dscusson of he fndngs. Secon 4 summarzes hs research. 2 Forecasng Technques As menoned before, hs secon descrbes he echncal deals of he suggesed forecasng approach. 2.1 Overvew In a ypcal shor erm radng envronmen, FX raders are manly neresed n hree muually exclusve evens or oucomes. These hree evens are o fnd ou wheher he change of bd rae n he fuure beween a parcular currency and he US dollar wll be up, seady or down. Our sysem predcs whch of hese hree muually exclusve evens wll come rue. Suppose he exchange rae s movng x percen durng an nerval such as an hour. We predc eher of up, seady or down defned as follows up => x *0.023%, seady => -0.023%*x* 0.023%, and down => x*-0.023%. The percenage change 0.023% of he bd exchange rae s chosen such ha each of he oucomes up, down and seady occur abou equally lkely n he ranng and esng perod. The major npu are news headlnes 1993-09-24 085910 "NO MONETARY, FISCAL STEPS IN JAPAN PM'S PLAN - MOF" 1993-09-24 090046 "GERMAN CALL MONEY NEARS 7.0 PCT AFTER REPO" 1993-09-24 090100 "BOJ SEEN KEEPING KEY CALL RATE STEADY ON THURSDAY" 1993-09-24 090106 "AVERAGE RATE FALLS TO 9.28 PERCENT AT ITALY REPO" 1993-09-24 090418 "DUTCH MONEY MARKET RATES LITTLE CHANGED" Each news headlne s assocaed wh a me samp showng he day, hour and mnue was receved hrough a news servce such as Reuers. Alhough vares on average, abou fory news headlnes are receved every hour. The npu daa and s flow over me s llusraed n fgure 1. The oher source of npu s a se of keyword records. These keyword records are provded once by a doman exper such as a currency rader and are no changed hereafer. We use over four hundred records conssng of a sequence of wo o fve words US, nflaon, weak Bund, srong Germany, lower, neres, rae pound, lower US, dollar, up There s no lmaon on he number of keyword records nor on he number of words consung a record. The acual currency movemens are flered ou from me seres of quoed exchange raes. 1993-09-24 085932 1.6535 1993-09-24 090000 1.6535... 1993-09-24 100004 1.6528 1993-09-24 100010 1.6520 On 24 Sep 1994, he dollar wen down versus he Deuschmark n he perod 9 o 10 am, as deprecaed by 0.4% ((1.6528-1.6535)/1.6535). Gven he daa descrbed, he predcon s done as follows 1. The number of occurrences of he keyword records n he news of each me perod s couned, see fgure 1. The counng of keyword records s case nsensve, semmng algorhms [28] are appled and he sysem consders no only exac maches. For example, f we have a keyword record US nflaon weak, and a headlne conans a phrase US nflaon s expeced o weaken, he sysem couns hs as a mach. 2. The occurrences of he keywords are hen ransformed no weghs (a real number beween zero and one). Ths way, each keyword ges a wegh for each me perod, see fgure 1. The compuaon of he weghs from her occurrences s descrbed n secon 2.3.

Keyword uples bonds los socks were mxed neres-rae ncrease Occurrences of keyword uples June 26 Nov 14 bond los 2... 4 socks were mxed 5... 1 neres-rae cu 3... 5 Weghed keyword uples June 26 Nov 14 bond los 0.1... 0.5 socks were mxed 0.3... 0.2 neres-rae cu 0.2... 0.6 HSI closng values June 26 10789.87 Nov 14 13393.93 Apply Weghng Schemes Web pages June 26... Nov 14 Weghed keyword uples June 26 Nov 14 bond los 0.1... 0.5 socks were mxed 0.3... 0.2 neres-rae cu 0.2... 0.6 Probablsc daalog rules hs_up(+1)...... hs_down(+1)...... hs_seady(+1)...... Fgure 1 weghs are generaed from keyword record occurrences. 3. From he weghs and he closng values of he ranng daa (he las 60 me perods for whch he oucome s known), classfcaon rules are generaed [34], see fgure 2. The rule generaon algorhm s provded n secon 2.4. 4. The rules are appled o he news of he wo mos recen perods o yeld he predcon. In fgure 1, he news receved beween 8 pm and 9 pm resuls n keyword record weghngs for he me perod -2. Perod -1 s from 9 pm o 10 pm. The forecas for perod, 10 pm o 11 pm, s compued by evaluang he rules on he weghed keyword records of perod -1 and -2. Noe ha only he las wo perods, -1 and -2, are used o predc he movemen n perod as hs yelds he hghes predcon accuracy. Every hour (wo and hree hours respecvely), only he keyword records n he laes news headlnes are acually couned. The couns of he prevous sxy perods (he ranng perods) are already known. The oucome (up, seady or down) of he laes ranng perod s deermned hrough readng of he quoed currency exchange raes. Now all hree rule ses are regeneraed, ha s, every hour, he rule generaon algorhm s nvoked so ha he rules reflec he mos recen marke behavor (markes do no always reac he same way o he same pece of news). Fnally, he newly generaed rules are appled o he laes keyword couns (keyword weghs respecvely) o yeld he predcon for he comng hour. Fgure 2 rules are generaed from weghed keywords and closng values. 2.2 Rule Semancs The classfer expressng he correlaon beween he keywords and one of he oucomes s a rule se. Versus convenonal rules [4,26], our rules have he advanage ha hey are able o handle connuous arbues and do no rely on Boolean ess. They have herefore more expressve power [31] by reanng rean he srengh of rule classfers comprehensble models and relavely fas learnng algorhms. For example, suppose ha arbue sock_rose has been normalzed so ha maxmum value s 1 and mnmum value s 0. A rule lke DOLLAR_UP(T) <- STOCK_ROSE(T-1) expresses a drec lnear relaonshp beween he dollar gong up and he wegh aached o sock rose. Suppose a second rule, DOLLAR_UP(T) <- STERLING_ADD(T-1). The even DOLLAR_UP s herefore defned o be STOCK_ROSE or STERLING_ADD. The probably of he even STOCK_ROSE or STERLING_ADD s compued by sock+serlng-sock*serlng, where sock denoes he wegh derved for keyword sock rose as oulned n secon 2.3. Tha s, he rules defne he even DOLLAR_UP as STOCK_ROSE or STERLING_ADD and map hs even o a real number. Ths mappng sasfes he hree well known Kolmogoroff axoms [33] and hence he mappng defned by he rules s a probably funcon n he sense of axomac probably heory. The number compued by he rules can herefore be called a probably. Smlarly, n nformaon rereval, weghs are compued for ndvdual keywords and mapped o a documen relevance number. When hs mappng sasfes he Kolmogoroff axoms hen s sad o be probablsc nformaon rereval and people alk abou he probably of a documen o be relevan. The am of hs secon s o brefly recall hs rule semancs n an nformal way. The rule generaon

algorhm s provded n secon 2.4. The followng s a sample rule se generaed by he sysem. STOCK_ROSE(T-1), NOT INTEREST_WORRY(T-1), NOT BUND_STRONG(T-2), NOT INTEREST_HIKE(T-2) STERLING_ADD(T-1), BUND_STRONG(T-2) YEN_PLUNG(T-1), NOT GOLD_SELL(T-2), STOCK_ROSE(T-1) Once hese rules are generaed from he ranng daa, hey are appled o he mos recenly receved news headlnes, he news of he las wo hours. So he lkelhood of he dollar gong up depends for nsance on he wegh compued for sock rose n he las hour and on he wegh of bund srong wo hours ago. Suppose he followng weghs for he las wo me perods, say perod 60 and 61 n our example STOCK_ROSE(61) 1.0 INTEREST_WORRY(61) 0.2 BUND_STRONG(60) 0.7 INTEREST_HIKE(60) 0.0 STERLING_ADD(61) 0.5 YEN_PLUNG(61) 0.6 Applyng he rules on hose weghs compues he probably of he dollar gong up whn he nex hour. More specfcally, he rules compue how lkely he dollar moves up from he begnnng o he end of perod 62,.e. how lkely moves up from 10 pm o 11 pm DOLLAR_UP(62) = 1*(1-0.2)*(1-0.7)*(1-0) + 0.5*0.7 + 0.6*(1-0.1)*1 // lkelhood ha frs rule rue, or second // rule rue, or hrd rule rue - 0 // snce frs and second rule are conradcory - 1*(1-0.2)*(1-0.7)*(1-0)* 0.6*(1-0.1) // lkelhood ha frs and hrd rule are // boh rue; noe sock_rose s aken only once - 0.5*0.7*0.6*(1-0.1)*1 // lkelhood ha second and hrd rule rue + 0 // hree rule bodes ogeher are conradcory = 0.811 The same way a probably for dollar seady and down respecvely s compued. If he rules also have aached a confdence expressng he accuracy of he rules, hen he rule evaluaon s he same excep ha each erm semmng from rule r wll addonally be mulpled wh conf(r). For example, suppose ha he hree rules above have aached confdence 0.9, 0.8 and 0.7 respecvely. The evaluaon DOLLAR_UP(62)yelds now =1*(1-0.2)*(1-0.7)*(1-0)*0.9 + 0.5*0.7*0.8 + 0.6*(1-0.1)*1*0.7 1 w ( ) = TF ( ) IDF ( ) max { TF ( ) IDF } - 1*(1-0.2)*(1-0.7)*(1-0)* 0.6*(1-0.1)*0.9*0.7-0.5*0.7*0.6*(1-0.1)*1*0.8*0.7 = 0.512 2.3 Compuaon of Keyword Record Weghs Ths secon descrbes how he weghs are generaed from he money markes news headlnes. The compuaon of weghs s llusraed n fgure 2. The wegh generaon makes use of wo npu sources he news headlnes and he keyword records. For each ranng perod a wegh s generaed for each keyword record from he news headlnes receved n hs perod. For every consecuve me perod he weghs generaed may be dfferen. There s a long hsory of ex rereval usng keyword weghng o rank documens [21,22,28,29]. In conras o hese approaches, however, we consder no sngle keywords bu word pars, rples ec. Furhermore, our am s no o fnd ou whch documens are mos relevan wh respec o a query, bu raher o dscover correlaon beween keyword records and currency movemens. In he followng subsecons we nvesgae hree dfferen mehodologes o compue he relevan weghs. 2.3.1 Boolean Mehod Suppose he me perod for whch forecass are made s one hour. I s assumed ha f he perod be he me beween 9 am o 10 am hen he nex me wndow refers o he perod from 10 am o 11 am and so on. Then he sysem checks wheher n some news headlne arrvng n perod a keyword record occurs a leas once. If so, he value of w() s se o one, oherwse w() s se o zero. w() s he wegh of record for me wndow. The erm frequency TF() s he number of occurrences of keyword record n a parcular me wndow. 2.3.2 TF x IDF Mehod Ths mehod consss of hree componens, erm frequency, dscrmnaon facor and normalzaon. The erm frequency alone s no a good ndcaor of he record mporance wh respec o a parcular me wndow. Ths s due o he fac ha f a keyword record appears frequenly, he keyword record s no necessarly a characersc ndcaor for he srengh or weakness of he US dollar. Therefore, a new componen s nroduced ha favors keyword records concenraed n only a few me wndows. We use nverse documen frequency IDF [28].

In our case, nverse documen frequency s defned as follows IDF = log N DF where N s he number of me wndows n he ranng daa and DF s he number of me wndows conanng record a leas once. The wegh w() of keyword s calculaed by mulplyng he erm frequency TF() wh he documen dscrmnaon IDF. In addon, he wegh has o be normalzed o oban a value beween zero and one. Therefore, s dvded by he maxmum number of mes record occurs n any ranng me wndow. 2.3.3 TF x CDF Mehod Anoher poenally useful concep s caegory frequency CF [28]. For each possble caegory (bd exchange rae of dollar up, down and seady) he CF of a keyword record s he number of me wndows conanng he keyword record n ha parcular caegory. Table 1 shows caegory frequency of keyword records. keyword record US, nflaon, weak Germany, lower, neres, rae $ up $ down $ seady 20 2 10 8 0 7 Bund, srong 1 4 12 Table 1 Caegory frequency of keyword records. The Caegory Dscrmnaon (CDF) s derved from CF. CDF CF up CF down CF seady = max (,,,,, ) DF where DF s he number of me wndows conanng keyword uple a leas once. For each record, he sum of s caegory frequences s equal o he number of me wndows ha appears n he ranng daa. The wegh w() of record s calculaed by mulplyng he erm frequency TF() wh he caegory dscrmnaon CDF. Fnally, w() s agan dvded by he maxmum number of mes record occurs n any me wndow. Ths agan assures ha w() s a wegh beween zero and one. 2.4 Rule Generaon For many daa mnng and dscovery asks, a rule-based approach has proven useful [1,15,16]. We also ake a rule-based approach. w ( ) = TF ( ) CDF 1 ( ) max { TF ( ) CDF } The algorhm generang he rules reles on he noon of mos general rule. A mos general rule s one whch has only one posve leral n s body nvolvng eher varable -1 or -2. The followng are mos general rules. STOCK_ROSE(T-1) BUND_STRONG(T-1) INTEREST_WORRY(T-1) STOCK_ROSE(T-2) A rule r s specalzed o rule s, denoed r>s, by appendng an addonal leral o he body of r. Suppose r s he rule STOCK_ROSE(T-1). The followng are specalzaons of r. mse( R { s}) = STOCK_ROSE(T-1), BUND_STRONG(T-1) STOCK_ROSE(T-1), BUND_STRONG(T-2) STOCK_ROSE(T-1), INTEREST_WORRY(T-1) STOCK_ROSE(T-1), NOT INTEREST_WORRY(T-1) Suppose he head of rule r s DOLLAR_UP (he cases DOLLAR_STEADY and DOLLAR_DOWN are analogous). The confdence of rule r, denoed conf(r), s defned as follows eval{ r} ( ) up( ) conf ( r) = eval ( ) { r} where s a ranng example, up() s 1 f he acual oucome s up and 0 oherwse. The evaluaon of he sngle rule r on example, denoed by eval{r}(), s explaned n secon 2.2 (see also [33]). The rule algorhm generang a rule se R s as follows [34]. R= whle R maxrules do { C={r r s a mos general rule} repea { r =r ( up( ) eval R ( )) { s} 2

{s}) } R'=R } C={s r>s} {r} unl (r=r ) r= he rule s C mnmzng mse(r aach conf(r) o r R=R {r} R= he rule se S R' mnmzng mse(s) In he nner loop, he algorhm selecs he rule s wh mnmal mean square error (mse) of he rule se R {s}. The evaluaon of example usng he rules R generaed so far wh her confdence plus he rule s s denoed by eval R {s} (). The summaon goes over all ranng examples and up() s defned as before (assumng he rule se o be bul s for dollar_up; for rule ses seady and down s analogous). Noe ha mean square error s used o measure he qualy of a rule. Ths s an approprae goodness measure for applcaons where he classfcaon problem s expeced o be relavely dffcul (no perfec models possble). Regresson analyss, neural ne learnng based on back propagaon and neares neghbor algorhms are also based on mean square error or square dsance consderaons. The las saemen of he algorhm selecs ha subse S of he generaed rules R' whch has leas mean square error. Ths s a common rule se smplfcaon and yelds he fnal resul R. 2.5 Fnal Predcon Once he rules are generaed, hey are appled o he mos recenly colleced exual news and analyss resuls. So he lkelhood of he dollar gong up n he perod sarng a 10 pm depends for nsance on he wegh compued for STOCK_ROSE. From hose probables,.e. how lkely he dollar s gong up, down or remans seady respecvely, he fnal decson s aken. For example, he fnal decson s ha he dollar moves up. Though maxmum lkelhood yelds farly good resuls for makng hs fnal decson, we found an mprovemen over maxmum lkelhood [3]. Ths mehod also proved superor n oher applcaons [35]. Each of he hree rule ses (DOLLAR_UP, DOLLAR_STEADY, DOLLAR_DOWN) yelds a probably sayng how lkely he respecve even wll occur. For each rule se j we compue a hreshold vj such ha f he compued lkelhood lj() s above he hreshold hen s aken as rue and false oherwse. The hreshold s deermned by esng he values vj = 0, 0.05, 0.1, 0.15,, or 1 and selecng ha hreshold whch resuls n he leas error on he ranng examples. Gven he hree hresholds vj and he hree lkelhoods lj(), here are hree possble cases. Exacly one of hree lkelhoods s above s hreshold,.e. lj()*vj for one j class j s he fnal predcon. Ths case s llusraed n able 2. None of he hree lkelhoods s above s hreshold,.e. lj()<vj for all j we compue l j ( ) v j d j ( ) = v and selec ha j o be rue for whch he devaon dj() s maxmal. All lkelhoods are above her hreshold,.e. lj()*vj for all j as before we selec ha j wh maxmal devaon dj(). j me Probably hreshold bnary decson up 0.811 0.45 1 seady 0.018 0.25 0 down 0.171 0.20 0 Table 2 forecas for me s up 3 Expermens, Resuls and Dscusson Expermens were conduced usng he HFDF93 daa avalable from Olsen & Assocaes n Zurch. Ths daa se conans FX rae quoes for USD-DEM, USD-JPY money marke news headlnes plus 3 monhs maury ner bank depos raes of USD and JPY. Beforehand he rule generaon par of he sysem was esed exensvely a he Treasury Deparmen of he Unon Bank of Swzerland (UBS) by FX dealers by provdng manual weghs [30] raher han havng hem generaed auomacally as s done by hs sysem. Some of he key ssues relaed o he expermenal se up are dscussed frs. When he sysem was esed a UBS wh manual npu of facors, produced successful resuls. Traders enered a sandard value beween very hgh (1) and very low (0) for welve facors namely, governmen polces, polcal news, rumours, cenral bank, employmen rae, nflaon, bonds, capal flow, sock marke, money supply, volaly NY and volaly London [30]. Bu he purpose of he expermens conduced usng he HFDF93 daa se was o derve such and oher facors auomacally. In every expermen, sxy es perods are consdered. We chose Sep 1993 as ranng and es perod as hs s he las monh for whch HFDF93 conans daa. More precsely, he esng perod for one hour predcons s 22 Sep 1993 1300 GMT me o 27 Sep 1993 1000 GMT (due o holday break hs perod consues sxy radng hours). The sar dae and me s he same when esng wo hour forecass. The esng perod for he hree hour predcons fnshes a he end of he daa se HFDF93 whch s 30 Sep; begns on 21 Sep 1993. The frs ranng perods are always hose nervals mmedaely

precedng he one o be predced. Hence he ranng perod when predcng he movemen from 900 o 1000 on 27 Sep s 22 Sep 1200 o 27 Sep 900. The followng varables were changed durng he expermenaon lengh of he me perod (one, wo and hree hours, see fgure 1), wegh generaon mehod (he hree mehods descrbed n secon 2.3), dfferen currences (exchange rae of DM and Yen agans US dollar). All expermens were conduced usng he same news headlnes and he same keyword record defnons. The predcon accuracy of he es daa s shown n ables 3 and 4. weghng mehod 1 hour 2 hours 3 hours Boolean 41 48 40 TF x IDF 42 39.5 42.5 TF x CDF 51 42 53 Table 3 predcon accuracy n % for DEM/USD and varous me perods. weghng mehod 1 hour 2 hours 3 hours Boolean 38.5 25 31 TF x IDF 28 37 35.5 TF x CDF 46 39 48 Table 4 predcon accuracy n % for JPY/USD and varous me perods. From ables 3 and 4 s observed ha daa se 1(DEM/USD) produces beer resuls han daa se 2 (JPY/USD). Ths s manly due o he fac ha he keyword records have a greaer nfluence on he DEM han on he Japanese Yen. We also noed ha for nraday forecasng, ncreasng he number of ranng daa makes no sgnfcan dfference n he predcon performance as he currences rally or fall on varous shor-erm economc facors and also on he rapdly changng condons of sock and bond markes. The resuls are compared agans a sandard sascal ool whch exrapolaes me seres daa. The hghes success rae acheved by usng a sascal package was 37 per cen. Our bes weghng mehod has an accuracy of 51 per cen for he same es and ranng perod. Human raders are sad o have an accuracy of up o 50 percen for he same nraday predcon ask. However, we dd no acually succeed n convncng a rader o measure hs personal predcon accuracy. There was smply a consensus among he raders ha s acually hard o acheve 50% accuracy. In anoher expermen we used a feed forward neural ne o predc he nex oucome (dollar up, seady or down) based on he prevous n such oucomes. Varyng n beween 2 o 10 he average accuracy acheved s 37.5%, hough never ffy per cen was reached. I s obvous ha he TF x CDF mehod s he bes and ha performs sgnfcanly beer han random guessng. Random guessng gves on average 33% accuracy because - by defnon of he hree possble oucomes up, seady and down - each oucome s abou equally lkely. We deermne he probably of he oucomes as repored n ables 3 o 8 when our sysem would do random predcon. In hs case, each predcon s ndependen from he oher predcons. So we have a Bnomal Dsrbuon wh mean n*p and varance n*p*(1-p) where p s he probably of success (0.33) and n s he number of mes we predc [18]. If n s raher large, Bnomal dsrbuon s approxmaely Normal dsrbuon. In he sequel, we consder only he TD x CDF mehod. The probably ha he predcon accuracy s equal or above 51% for random guessng s less han 0.4% when havng 60 rals (see able 3). The probably of achevng a leas 43.3% predcon accuracy wh random guessng s 95% for 60 rals. Each of he oucomes n he frs and hrd column of able 3 and 4 can herefore be acheved by random guessng only wh a probably of less han 5%. The oucome of he second column n ables 3 and 4 can be acheved by random guessng wh a lkelhood of a lle more han 5%. However, when akng all 180 forecass of ables 3, 4, 5 ogeher, hen he lkelhood of geng he average predcon accuracy 48.6% ( (51%+42%+53%)/3 ) by random guessng s below 0.0001%. The probably of achevng he average accuracy of 44.3% ( (46%+39%+48%)/3 ) as repored n able 4 by random guessng s sll below 0.001%. Hence, our sysem performs almos ceranly beer han random guessng. For DM and he bes performng weghng mehod, TF*CDF, he resuls are presened a lle more dealed. The hrd column n able 5 ndcaes how many mes he sysem predcs up or down and he dollar was acually seady; or, he sysem predcs seady and he sysem was acually up or down. The las column ndcaes he percenage of oally wrong predcons. Tha s, he sysem expecs he dollar o go up and moves down, or vce versa. Table 6 shows he dsrbuons of he acual oucomes and he forecass. perod accuracy slghly wrong wrong DM, 1 hr 52% 23% 25% DM, 2 hr 41% 30% 29% DM, 3 hr 53% 22% 25% Table 5 forecasng accuracy for DM/US dollar. dsrbuon of acual oucome dsrbuon of he forecas DM up seady down up seady down 1h 35% 30% 35% 33.3% 30% 36.6% 2h 30% 31.6% 38.3% 35% 33.3% 22.6% 3h 36.6% 33.3% 30% 35% 31.6% 33.3%

Table 6 dsrbuon of DM/US dollar forecas. 4 CONCLUSIONS A new approach o forecas nraday exchange raes usng news headlnes has been nroduced. The major dfference from oher forecasng echnques such as echncal analyss or sascs s ha he npu o he sysem s dfferen. We ake as npu exual nformaon whch s hard o process bu whch s rch n conens. The convenonal approach s o ake numercal me seres daa and o analyze hose daa usng varous echnques. In conras o numerc me seres daa our npu daa conans no only he effec (e.g., he dollar rses agans he Deuschmark) bu also he possble causes of he even (e.g., because of a weak German bond marke). Hence mproved predcons are expeced from hs rcher npu. We gave a comprehensble overvew on he mely collaboraon of our ex analyses, preprocessng and forecasng approach. Dfferen ways of pre-processng he news headlnes have been suggesed and he rule based predcon engne was explaned n deal. Exensve expermenaon has revealed ha he weghng mehod TF x CDF performs he bes. The resuls were compared wh hose of a convenonal numerc me seres analyss ool and wo dfferen neural ne approaches. I was found ha he echnques nroduced n hs paper ouperform oher approaches and ha our approach s sgnfcanly beer han random guessng. Ths reveals he enormous poenal of he sysem and opens up many pahs for fuure research n hs area. I s also planned o predc n fuure oher fnancal markes such as bond markes. Furhermore, s beleved ha here are some pars of he sysem whch can be furher mproved o provde more accurae forecass. For example, by combnng our echnques wh oher forecasng mehodologes a powerful hybrd forecasng sysem can be bul. Fnally, s concevable ha he keyword records can also be generaed auomacally from a sample of news headlnes. 5 References [1] R. Agrawal and R. Srkan, Fas Algorhms for Mnng Assocaon Rules, VLDB, pp 487-499, 1994. [2] P. Brrer and T. Eggenschwler, Frameworks n he Fnancal Engneerng Doman - An Experence Repor. UBILAB, Unon Bank of Swzerland, Bahnhofsrasse 45, CH-8021 Zurch, Swzerland,1996. [3] V. Cho and B. Wührch, Towards Real-Tme Dscovery from Dsrbued Informaon Sources, Pacfc Asa Conf on KDD and Daa Mnng (PAKDD98), ge from hp//www.cs.us.hk/~bea/bo.hml, 1998. [4] W. Cohen, Fas Effecve Rule Inducon, 12h In Conf on Machne Learnng, pp 80-89, 1995. [5] E. Cox and T. J. Schwarz, Sofware Revew Around he World wh Fuzzy Producs, AI Exper, March 1990, p44-8. [6] T. Dhllon, Managng Inellgen Sysems n a Bankng and Fnance Envronmen, La Trobe Unversy, Bundoora, Vcora 3083 Ausrala, 1996.. [7] C. Goodhar, News and he Foregn Exchange Marke, Journal of Inernaonal Secures Markes, Vol. 4, pp333-348, 1989. [8] C. Goodhar, S.G. Hll, S. G. B. Henry and B. Persaran, News Effecs n a Hgh Frequency Model of he Serlng-Dollar Exchange Rae, Journal of Appled Economercs, Vol. 8, pp 1-13, 1993. [9] S. Goonalake and P. Treleaven, Inellgen Sysems for Fnance and Busness, Wley, 1995. [10] S. Goonalake and S. Khebbal, Inellgen Hybrd Sysems, Wley, 1995. [11] P D Grauwe, H Dewacher, e al; Exchange Rae Theory, Blackwell Publcaons, Cambrdge, 1993. [12] P. Gregory, Neural Neworks Developng an Effecve Sraegy, Conference - Neural Neworks 1990, Blenhem Onlne. [13] D. M. Gulaume, M. M. Dacorogna, R R Dave, U. A. Muller, R B Olsen and O. V. Pce, From he Brd's Eye o he Mcroscope - A Survey of New Sylzed Facs of he InraDaly Foregn Exchange Markes, DMG.1994-04-06, Olsen & Assocaes, Seefeldsrassen 233, 8008 Zurch, Swzerland, 1995. [14] D. M. Gullaume, O V. Pce and M. M. Dacoragna, On he InraDaly Performance of GARCH Processes, Olsen & Assocaes, Seefeldsrassen 233, 8008 Zurch, Swzerland, 1995. [15] J. Han, Y. Ca, C. Cercone, and Y. Huang, Daa- Drven Dscovery of Quanave Rules n Relaonal Daabases, IEEE TKDE, 5(1)29-40, 1993.

[16] K. Haonen e al, Knowledge Dscovery from Telecommuncaon Nework Alarm Daabases, In Conf on Daa Engneerng, 1996. [17] J. Herz, A Krogh, and R. G. Palmer, Inroducon o he Theory of Neural Compuaon. Wley, 1990. [18] W.W. Hnes, Probably and Sascs n Engneerng and Managemen Scence, Wley, 1990. [19] K. Jm, J. La, and B. Wührch, Rule Dscovery Error Measures and Condonal Rule Probables, Pacfc Asa Conf on Knowledge Dscovery and Daa Mnng, pp. 82-89, 1997. [20] U. M. Fayyad, G. Paesky-Shapro, P. Smyh and R. Uhurusamy, Advances n Knowledge Dscovery and Daa Mnng, AAAI Press / The MIT Press, pp625, 1996. [21] E.M. Keen, Query Term Weghng Schemes for Effecve Ranked Oupu Rereval, 15h In. Onlne Informaon Meeng, pp.135-142, 1991. [22] D.L. Lee, H. Chuang, and K. Seamons, Documen Rankng and he Vecor-Space Model, IEEE Compuer, Theme Issues on Assessng Measuremen, Aprl/May, 14(2)67-75, 1997. [23] U. A Muller e al., Volales of Dfferen Tme Resoluons - Analyzng he Dynamcs of Marke Componens, Inernal documen, Olsen & Assocaes, Zurch, Swzerland, 1996. [24] O. V. Pce e al., Genec Algorhms wh Collecve Sharng for Robus Opmzaon n Fnancal Applcaons, Inernal documen, Olsen & Assocaes, Zurch, Swzerland, 1996. [25] J. Prng, Techncal Analyss Explaned. McGraw- Hll, 1991. [26] J. R. Qunlan, Smplfyng Decson Trees, In J Man-Machne Sudes, 27221-234, 1987. [27] F.L. Rvera-Baz and L.A. Rvera-Baz, Inernaonal Fnance and Open Economy Macroeconomcs 2 nd. MacMllan, 1994. [28] G. Salon and M.J. McGll, Inroducon o Modern Informaon Rereval. New York McGraw-Hll, 1983. [29] G. Salon and C. Buckley, Term Weghng Approaches n Auomac Tex Rereval, Informaon Processng and Managemen, Vol. 24, No. 5, pp513-523, 1988. [30] K. P Sankaran, A Sysem o Forecas Currency Exchange Raes (MPhl hess). Deparmen of Compuer Scence, Hong Kong Unversy of Scence and Technology, 1996. [31] P. Schäuble and B Wührch, On he Expressve Power of Query Languages. ACM TOIS, 12(1)67-91, 1994. [32] D Schmd, A. K. Drch, W Dreyer and R. Mar, Tme Seres, a Negleced Issue n Temporal Daabase Research? UBILAB, Unon Bank of Swzerland, Unversassrasse 84, CH-8033 Zurch, Swzerland, 1995. [33] B Wührch, Probablsc Knowledge Bases. IEEE TKDE, 7(5) 651-698,1995. [34] B. Wührch, Dscoverng Probablsc Decson Rules, Inernaonal Journal of Inellgen Sysems n Accounng, Fnance and Managemen, Vol 6269-277, 1997. [35] B. Wührch, S. Leung, D. C. Peramunelleke, W. Lam, V. Cho, and J. Zhang, Daly Sock Marke Predcons from World Wde Web Daa, In Conf on Knowledge Dscovery and Daa Mnng (KDD98), pp. 269-274, 1998 [36] T. Yagyu, H. Yuze, M. Yoeneda, and S. Fukam, Foregn Exchange Trade Suppor Exper Sysem, Proc. IFSA'91, Arfcal Inellgence, Brussels, pp. 214-217, 1991.