Java Modules for Time Series Analysis



Similar documents
Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

Volatility modeling in financial markets

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods

Data Preparation and Statistical Displays

Multiple Choice: 2 points each

Gamma Distribution Fitting

Basel II: Operational Risk Implementation based on Risk Framework

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Exercise 1.12 (Pg )

Regression III: Advanced Methods

Simple linear regression

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Geostatistics Exploratory Analysis

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Descriptive Statistics

State Space Time Series Analysis

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Quantitative Methods for Finance

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Logistic Regression (a type of Generalized Linear Model)

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

AP Physics 1 and 2 Lab Investigations

Data Mining: Algorithms and Applications Matrix Math Review

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

CS Introduction to Data Mining Instructor: Abdullah Mueen

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

IBM SPSS Neural Networks 22

Master of Mathematical Finance: Course Descriptions

11. Time series and dynamic linear models

Algebra I Vocabulary Cards

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Cost of Capital and Corporate Refinancing Strategy: Optimization of Costs and Risks *

MATLAB for Use in Finance Portfolio Optimization (Mean Variance, CVaR & MAD) Market, Credit, Counterparty Risk Analysis and beyond

SUMAN DUVVURU STAT 567 PROJECT REPORT

Statistical Machine Learning

How To Understand And Solve A Linear Programming Problem

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Using Duration Times Spread to Forecast Credit Risk

Univariate Regression

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Univariate and Multivariate Methods PEARSON. Addison Wesley

Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies

Normality Testing in Excel

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

A comparison between different volatility models. Daniel Amsköld

2. Descriptive statistics in EViews

Dongfeng Li. Autumn 2010

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

A model to predict client s phone calls to Iberdrola Call Centre

Chapter 3 RANDOM VARIATE GENERATION

STA 4273H: Statistical Machine Learning

Statistics Graduate Courses

Simple Linear Regression Inference

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Algebra 1 Course Information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Dr Christine Brown University of Melbourne

Big Ideas in Mathematics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Executive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

MATHEMATICAL METHODS OF STATISTICS

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Time Series Analysis

LDA at Work: Deutsche Bank s Approach to Quantifying Operational Risk

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Basic Probability and Statistics Review. Six Sigma Black Belt Primer

Module 4: Data Exploration

DRAFT. Algebra 1 EOC Item Specifications

Credit Implied Volatility

Statistical Functions in Excel

Linear Classification. Volker Tresp Summer 2015

Exploratory Data Analysis

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Integrated Resource Plan

Software Review: ITSM 2000 Professional Version 6.0.

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College

Time Series Analysis

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Time Series Laboratory

096 Professional Readiness Examination (Mathematics)

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Least Squares Estimation

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

How To Check For Differences In The One Way Anova

An introduction to Value-at-Risk Learning Curve September 2003

2013 MBA Jump Start Program. Statistics Module Part 3

Transcription:

Java Modules for Time Series Analysis

Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction

1. Clustering + Cluster 1 Synthetic Clustering + Time series Cluster 2 Synthetic + Cluster 3 Synthetic

Clustering Goal grouping of time series in such a way that the series with similar historical behavior to be in the same group Input A set of single time series (bond, share, fund prices) or time series groups (for example interest rate market curves) Number of clusters Output Clusters of time series, Clustering quality statistics Every cluster is represented by a prototype series (synthetic curve) with the same dimensionality as the all other series Using Clustering can be used to reduce a huge number of series and thus to facilitate and make feasible time consuming operation like calculation of huge correlation matrices, etc. The number of time series is reduced by: Identifying the cluster in which a series belongs to Using of the prototype of the cluster instead of the real series Determine similar behavior of market factors or Issuers (Cartels)

Clustering Clustering can be performed for: Time series (for example shares or bond prices having historical development) Curve time series (for example interest rate market curves having historical development) In addition to the clusters with their series and prototypes clustering quality statistics are generated: Inter and intra cluster statistics, adjuster R squared, average linkage, etc. Some of these statistics can be used to determine the optimal number of clusters, i.e. the best number of groups of min internal distance and max distance to each other

Error Finding optimal number of clusters using clustering error Num Clusters Adjusted R squared Error 2 0,7888814 0,2111186 3 0,8856307 0,1143693 4 0,9281010 0,0718990 5 0,9351225 0,0648775 6 0,9360977 0,0639023 7 0,9361842 0,0638158 8 0,9647925 0,0352075 9 0,9543623 0,0456377 10 0,9544117 0,0455883 11 0,9758081 0,0241919 12 0,9757913 0,0242087 13 0,9572335 0,0427665 14 0,9572007 0,0427993 15 0,9571655 0,0428345 16 0,9573212 0,0426788 17 0,9572855 0,0427145 18 0,9861978 0,0138022 19 0,9861863 0,0138137 20 0,9861746 0,0138254 21 0,9861818 0,0138182 22 0,9861700 0,0138300 23 0,9861583 0,0138417 24 0,9861466 0,0138534 25 0,9861356 0,0138644 26 0,9861230 0,0138770 0,20 0,15 0,10 0,05 0,00 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Optimal number of clusters = 18 Number of clusters Optimal number of clusters = 18

Example: clustering of 20 spread curves Actual Spread Curves 23.07.08 MM CM CM CM CM CM Num Maturity(Years) 0,5 1 2 3 5 10 1 FORTUM-AEUR-MM 0,1439% 0,2293% 0,3047% 0,3686% 0,4647% 0,5828% 2 SRBIA-AEUR-CR 1,2863% 1,2500% 1,7880% 2,1133% 2,6375% 3,0123% 3 UKRAIN-AUSD-CR 1,7606% 1,9218% 2,7834% 3,2418% 3,8589% 4,5407% 4 ITALY-AEUR-CR 0,1022% 0,1167% 0,1896% 0,2727% 0,3965% 0,4980% 5 SLOVEN-AEUR-CR 0,0376% 0,1044% 0,1351% 0,1584% 0,2338% 0,3622% 6 CZECH-AEUR-CR 0,1295% 0,1471% 0,2220% 0,2583% 0,3627% 0,4925% 7 TURKEY-AUSD-CR 0,8137% 0,9853% 1,6326% 2,1247% 2,7918% 3,4798% 8 ROMANI-AUSD-CR 0,5478% 0,5594% 1,1125% 1,3618% 1,7735% 2,0718% 9 POLAND-AUSD-CR 0,0883% 0,1608% 0,2432% 0,3576% 0,5238% 0,6876% 10 PEUGOT-AEUR-MM 0,6865% 0,8433% 1,0406% 1,2925% 1,6265% 1,7360% 11 JPM-CUSD-MR 1,0987% 1,0272% 1,1421% 1,2384% 1,3570% 1,3909% 12 DANBNK-AEUR-MM 0,1830% 0,2635% 0,3686% 0,4481% 0,5610% 0,6098% 13 GS-AUSD-MR 1,1586% 1,1238% 1,1868% 1,2137% 1,2638% 1,2969% 14 CROATI-AEUR-CR 0,1554% 0,2741% 0,4423% 0,5602% 0,8274% 1,0806% 15 ESPSAN-CEUR-MM 0,7725% 0,9577% 1,2941% 1,5505% 1,9120% 1,9371% 16 SUEDZU-AEUR-MM 0,6148% 0,7388% 0,9829% 1,2075% 1,5032% 1,6203% 17 LEH-AUSD-MR 6,7011% 6,8482% 5,4481% 4,5185% 3,2897% 2,6982% 18 HELLAS-CEUR-MM 4,6793% 4,9875% 8,3839% 9,9521% 10,6632% 10,6198% 19 REPHUN-AUSD-CR 0,2793% 0,3050% 0,5168% 0,7809% 1,1361% 1,3992% 20 BGARIA-AUSD-CR 0,3123% 0,4825% 0,8722% 1,1222% 1,5268% 1,8284%

All 20 spread curves 12,00% 10,00% 8,00% 6,00% 4,00% 2,00% 0,00% Actual Spread Curves on 23.07.2008 0,5 1 2 3 5 10 FORTUM-AEUR-MM SRBIA-AEUR-CR UKRAIN-AUSD-CR ITALY-AEUR-CR SLOVEN-AEUR-CR CZECH-AEUR-CR TURKEY-AUSD-CR ROMANI-AUSD-CR POLAND-AUSD-CR PEUGOT-AEUR-MM JPM-CUSD-MR DANBNK-AEUR-MM GS-AUSD-MR CROATI-AEUR-CR ESPSAN-CEUR-MM SUEDZU-AEUR-MM LEH-AUSD-MR HELLAS-CEUR-MM REPHUN-AUSD-CR BGARIA-AUSD-CR

Clusters of spread curves Spread Curve Clusters - Actual Rates 23.07.08 MM CM CM CM CM CM Std Deviation Num Maturity(Years) 0,5 1 2 3 5 10 Cluster 1 0,1725% 1 FORTUM-AEUR-MM 0,1439% 0,2293% 0,3047% 0,3686% 0,4647% 0,5828% 4 ITALY-AEUR-CR 0,1022% 0,1167% 0,1896% 0,2727% 0,3965% 0,4980% 5 SLOVEN-AEUR-CR 0,0376% 0,1044% 0,1351% 0,1584% 0,2338% 0,3622% 6 CZECH-AEUR-CR 0,1295% 0,1471% 0,2220% 0,2583% 0,3627% 0,4925% 9 POLAND-AUSD-CR 0,0883% 0,1608% 0,2432% 0,3576% 0,5238% 0,6876% 12 DANBNK-AEUR-MM 0,1830% 0,2635% 0,3686% 0,4481% 0,5610% 0,6098% 14 CROATI-AEUR-CR 0,1554% 0,2741% 0,4423% 0,5602% 0,8274% 1,0806% Cluster Spread 0,1200% 0,1851% 0,2722% 0,3463% 0,4814% 0,6162% Cluster 2 0,3019% 8 ROMANI-AUSD-CR 0,5478% 0,5594% 1,1125% 1,3618% 1,7735% 2,0718% 13 GS-AUSD-MR 1,1586% 1,1238% 1,1868% 1,2137% 1,2638% 1,2969% 10 PEUGOT-AEUR-MM 0,6865% 0,8433% 1,0406% 1,2925% 1,6265% 1,7360% 11 JPM-CUSD-MR 1,0987% 1,0272% 1,1421% 1,2384% 1,3570% 1,3909% 15 ESPSAN-CEUR-MM 0,7725% 0,9577% 1,2941% 1,5505% 1,9120% 1,9371% 16 SUEDZU-AEUR-MM 0,6148% 0,7388% 0,9829% 1,2075% 1,5032% 1,6203% 19 REPHUN-AUSD-CR 0,2793% 0,3050% 0,5168% 0,7809% 1,1361% 1,3992% 20 BGARIA-AUSD-CR 0,3123% 0,4825% 0,8722% 1,1222% 1,5268% 1,8284% Cluster Spread 0,6838% 0,7547% 1,0185% 1,2209% 1,5123% 1,6601% Cluster 3 0,8026% 2 SRBIA-AEUR-CR 1,2863% 1,2500% 1,7880% 2,1133% 2,6375% 3,0123% 3 UKRAIN-AUSD-CR 1,7606% 1,9218% 2,7834% 3,2418% 3,8589% 4,5407% 7 TURKEY-AUSD-CR 0,8137% 0,9853% 1,6326% 2,1247% 2,7918% 3,4798% 17 LEH-AUSD-MR 6,7011% 6,8482% 5,4481% 4,5185% 3,2897% 2,6982% Cluster Spread 2,6402% 2,7512% 2,9129% 2,9995% 3,1445% 3,4328% Cluster 4 0,0000% 18 HELLAS-CEUR-MM 4,6793% 4,9875% 8,3839% 9,9521% 10,6632% 10,6198% Cluster Spread 4,6793% 4,9875% 8,3839% 9,9521% 10,6632% 10,6198%

Clusters of market curves Cluster 1 23.07.08 1,2000% 1,0000% FORTUM-AEUR-MM ITALY-AEUR-CR 0,8000% 0,6000% 0,4000% SLOVEN-AEUR-CR CZECH-AEUR-CR POLAND-AUSD-CR DANBNK-AEUR-MM Synthetic curve 0,2000% 0,0000% 1 2 3 4 5 6 CROATI-AEUR-CR Cluster Spread Cluster 2 23.07.08 2,5000% ROMANI-AUSD-CR Synthetic curve 2,0000% 1,5000% 1,0000% 0,5000% GS-AUSD-MR PEUGOT-AEUR-MM JPM-CUSD-MR ESPSAN-CEUR-MM SUEDZU-AEUR-MM REPHUN-AUSD-CR 0,0000% 1 2 3 4 5 6 BGARIA-AUSD-CR Cluster Spread

Historical development Cluster 2: 6 Months Cluster 2: 1 Year 3,00% 3,00% 2,50% ROMANI-AUSD-CRMM 2,50% ROMANI-AUSD-CRCM GS-AUSD-MRMM GS-AUSD-MRCM 2,00% PEUGOT-AEUR-MMMM 2,00% PEUGOT-AEUR-MMCM JPM-CUSD-MRMM JPM-CUSD-MRCM 1,50% ESPSAN-CEUR-MMMM 1,50% ESPSAN-CEUR-MMCM SUEDZU-AEUR-MMMM SUEDZU-AEUR-MMCM 1,00% REPHUN-AUSD-CRMM 1,00% REPHUN-AUSD-CRCM BGARIA-AUSD-CRMM BGARIA-AUSD-CRCM 0,50% Cluster 0,50% Cluster 0,00% 05.2.2008 05.4.2008 05.6.2008 Cluster 2: 5 Years 0,00% 05.2.2008 05.4.2008 05.6.2008 Cluster 2: 10 Years Synthetic curve 4,00% 4,00% 3,50% ROMANI-AUSD-CRCM 3,50% ROMANI-AUSD-CRCM 3,00% 2,50% GS-AUSD-MRCM PEUGOT-AEUR-MMCM JPM-CUSD-MRCM 3,00% 2,50% GS-AUSD-MRCM PEUGOT-AEUR-MMCM JPM-CUSD-MRCM 2,00% ESPSAN-CEUR-MMCM 2,00% ESPSAN-CEUR-MMCM 1,50% 1,00% 0,50% SUEDZU-AEUR-MMCM REPHUN-AUSD-CRCM BGARIA-AUSD-CRCM Cluster 1,50% 1,00% 0,50% SUEDZU-AEUR-MMCM REPHUN-AUSD-CRCM BGARIA-AUSD-CRCM Cluster 0,00% 0,00% 05.2.2008 05.4.2008 05.6.2008 05.2.2008 05.4.2008 05.6.2008

2. Non-Normal Distributions Theoretical distribution type + parameters Non-normal distributions Cauchy Empirical distribution Normal

Non-normal distributions Goal automatically identification of distribution type and its parameters using market time series and use the Copula approach to simulate market factors in Monte Carlo VaR using mapped distributions Input The time series of the market factors Chosen standard distribution types (Beta, Cauchy, Student, Weibull, etc.) Output Identified distribution type The parameters of the identified distribution type Numerical estimation of the distance between the empirical distribution and all other distribution types (allows to order distribution types and choose other good fitting distribution type) Using Improving Monte Carlo VaR simulation by using of correlated non-normal distribution samples instead of correlated normal distribution samples

Non-normal distributions Calculation of Value at Risk Q Confidence level a quartile Market VaR(a) Expected value The distribution of time series for market factors is assumed to be normal in the most cases. But this don t correspond to reality, the time series expose often skewed and flat tail distributions which is connected to underestimation of market risk for improbable large loses (flat tail losses)

Non-normal distributions Mapping Risk Factors to best fit Distribution The best fit is given by the Cauchy Distribution (green) Normal Distribution The Beta Distribution will produce larger confidence risk because of the flat tail

Distribution parameters estimation The main important goal is to achieve best modeling of empirical distribution shape by reproducible theoretical distribution shape Together with the distribution type identification, the distribution parameters are also estimated from market data using the method of moments, least squares regression or maximum likelihood. The additional parameters shift and scale are also used to avoid distribution parameters values in undefined regions Data having a given distribution can be generated by: Distribution type Distribution parameters Additional parameters (shift, scale) Values count Cumulative distributions are used for the subsequent Copula Monte Carlo Simulation

Standard distribution parameters estimation 10 distribution types Distribution parameters Additional parameters Distribution Parameter 1 Parameter 2 Parameter 1 Parameter 2 Beta Shape Shape Shift Scale Cauchy Location Scale --- --- Exponential Rate --- Shift --- Inverse Normal Mu Lambda Shift --- Log Normal Log Scale Shape Shift --- Normal Mean Variance Shift --- Pareto Scale Shape --- --- Rayleigh Sigma --- Shift --- Student Nu --- Shift Scale Weibull Scale Shape Shift ---

Distribution mapping Two metrics are used to compare distributions: Histogram metric empirical histogram bins frequencies are compared against theoretical histogram bins probabilities Cumulative distances metric ignoring X-axis values, cumulative distances between market series data points are calculated. The same function is calculated using theoretically generated values for the distribution under consideration. These two cumulative values are compared. Both histogram and cumulative distances are compared using average squared error

Histogram metric Distances between theoretical and empirical histograms Theoretical histogram Empirical histogram Best (mapped) distribution is identified by the minimum sum of squared distances between the distribution theoretical histogram and empirical histogram min max

Cumulative distances metric Data values y Cumulative distances between values Cumulative distances graph p i p 1 = d 1 d 2 p 2 = d 1 + d 2 p 3 = d 1 + d 2 + d 3 d 1 i Best (mapped) distribution is identified by the minimum sum of squared distances between the empirical cumulative values and corresponding theoretical cumulative values

Copula Monte Carlo VaR Example for 2 Market Factors (Lognormal and Beta distributed) Market Risk Correlation Matrix Normal distributed correlated random samples Cumulative Distribution Lognormal Distribution x = F -1 (y) Equally distributed and Correlated random samples (0...1) Cumulative Distribution Monte Carlo Simulation VaR Distribution Skewed Distribution Beta Distribution Correlated non-normal distributed samples are put to Monte Carlo simulation instead of correlated normal distributed samples generated using the market risk correlation matrix

Copula Monte Carlo VaR Skewed and flat tail VaR distributions Skewed VaR distribution Flat tail VaR distribution

Prototype system Theoretical histograms Empirical histogram Cumulative values Parameters estimations Distributions generator Distances between theoretical and empirical distributions Best Fit for Weibul Distribution

3. Multifactor models Formula Target factor Multifactor Models Target factor = Coefficients Explanatory factors Functions -0,0167800 Instruments_Fund-FR0000448870 70,0949200 Instruments_Fund-LU0396265430 sqrt -11,4827000 StockIndexCurve_DJIA ln 0,0074800 StockIndexCurve_GEX 0,0000034 StockIndexCurve_Nasdaq-Composite ^2.0 0,4383100 StockIndexCurve_Nikkei225 sqrt -0,0149300 StockIndexCurve_SDAXPI sqrt -10,5751500 StockIndexCurve_TECDAXPI ln -6739,26524 Target factor Obtained by formula Explanatory factors Time series

Multifactor models Goal building formulas describing unknown market instruments by instruments with known pricing models based on time series Input The historical time series of the target factor (the instrument with unknown pricing approach or unknown market factor dependency) Other available historical time series to be used as explanatory factors (indices, spread curves, interest rate, inter banking rate, foreign exchange rate, etc.) Output Polynomial like formula describing the dependency of the target factor by the explanatory factors Using The generated formula can be used to develop a new type instrument having a pricing approach based on a set of known factors Obtain a factor contribution to instrument price development and risk

Multifactor models object Available market factors Target instrument Formula building Formula calibration Time Target instrument time series Target instrument by formula Explanatory factor time series Other factors The target instrument is calculated by formula The formula is built and periodically calibrated using target instrument and explanatory instrument time series

Stages of modeling Start Target factor selection Explanatory factors suggestion/selection Basis functions combination determination Regression coefficients determination Final formula determination and error calculation End - all given factors in the system - determined by system and/or human - determined by system

Explanatory factors selection When a target factor is selected explanatory factors should be selected by automatic suggestion and/or hand choosing Automatic suggestion could be done by: Clustering Explanatory factors are obtained from the cluster in which the target factor is classified. If the number of explanatory factors determined in this way are insufficient then the number of clusters could be decreased in order to increase the number of elements in the cluster Minimal co-variances between candidate factors Co-variances between all factors are calculated and the first n minimal co-variances determine the factors Maximal co-variances between candidate factors and the target factor

Formula builder After the target and explanatory factors are selected formula building process should be started in which the system performs: Finding of combination of basis functions to the explanatory factors The basis functions are used to: improve the accuracy avoid linear dependencies between factors in that causes matrices equations problems Regression coefficients β i (Beta Factors) y = β 1 f 1 (x 1 ) + β 2 f 2 (x 2 ) +... + β n f m (x n ) + β n+1 + ε y target factor x 1, x 2,, x n explanatory factors β 1, β 2,, β n, β n+1 regression coefficients f 1, f 2,, f m basis functions ε error

Basis functions combination Basis functions f 1 f 2 f 3 f 4 f m Function exponent logarithm sine cosine htangent Explanatory factors x 1 x 2 x 3 x n Name GOV Bel FX USD Oil price. Gold price Date 1 Date 2 Date t Date K Combination of basis functions applied to the explanatory factors Target factor y ỹ GOV Aut GOV Aut estimation ε = (y - ỹ) 2 Distance ỹ =β 1 f 3 (x 1 ) + β 2 f 1 (x 2 ) +... + β n f 2 (x n ) + β n+1 + ε

Prototype system Generated formula Graphic results: Target and Multi-Factor

Prototype system settings

4. Implied Rating Scale building Time series Implied Ratings Classification Rating BB Tendency BBB

Implied Rating (Basel III) Goal building of a rating scale based on explicit CDS time series and using it to identify both the implied rating and the tendency of a new CDS input series Input A set of CDS time series that relate to assets or issuers (CDS spread curves or indices, bond prices, share prices, etc.) Rating system - number and symbols for the ratings of the rating scale Output Scale with boundaries between the ratings Using By supplying the built scale with a new time series representing an issuer, the system identifies: Current rating based on the historical development giving more importance to the last values Tendency what is the next probable rating

Steps to obtain implied rating Establishment of the rating scale Available time series are used to build given number of rating degrees and to determine their boundaries The time series are distributed into given rating degrees according to the historical behavior The center of every degree is determined using the all time series which belong to the degree The boundaries are derived from adjacent centers using equally distanced series A new time series is classified to a rating class by comparing with the centers (that are also time series) of the scale classes and finding the closest one The tendency is determined by Finding the second closest center of rating degrees Finding the closes boundary of the classified rating level

Rating degrees boundaries The time series of the rating degrees may overlap АА АА - A A The points of the ratings boundaries are calculated as average values of the corresponding points of the centers of the series in every rating degree The center of the series in a given degree resides not in the middle of the degree boundaries because in the most cases the time series is nonuniformly distributed

02.08... 16.08... 30.08... 13.09... 27.09... 11.10... 25.10... 10.11... 24.11... 09.12... 23.12... 10.01... 24.01... 07.02... 21.02... 07.03... 21.03... 04.04... 18.04... 03.05... 17.05... 31.05... 16.06... 01.07... 15.07... 29.07... Rating degrees boundaries 1,80% Boundary Degree center The boundary resides in the mid of the series centers The center of the series resides not in the mid of the boundaries 1,40% 1,00% 0,60% 0,20% A AA

Weighting the historical values Weighting of the series values (EWMA by Decay Factor) is applied in order to make more important more actual date values 1,60% 1,40% 1,20% 1,00% 0,80% The last series values reside within the degree boundaries 0,60% 0,40% AA 0,20% 0,00%

Determine the rating of a new series In the classification phase histograms are build for the distributions of the data within the best and second best degrees (corresponding to the rating and tendency ) The histograms are shown with the centers of the class and the mean of the new classified series Mean and standard deviation used to build the histograms are calculated taking into account of the same decay factor used to build the ratings scale

02,08,,,, 16,08,,,, 30,08,,,, 13,09,,,, 27,09,,,, 11,10,,,, 25,10,,,, 10,11,,,, 24,11,,,, 09,12,,,, 23,12,,,, 10,01,,,, 24,01,,,, 07,02,,,, 21,02,,,, 07,03,,,, 21,03,,,, 04,04,,,, 18,04,,,, 03,05,,,, 17,05,,,, 31,05,,,, 16,06,,,, 01,07,,,, 15,07,,,, 29,07,,,, Determine the rating of a new series 1,60% 1,40% 1,20% 1,00% 0,80% 0,60% 0,40% 0,20% Classification of a new series - Barklays Bank PLC New time series (yellow) that should be classified A AA Mean of the 60 new series, 50 Rating AA and 40 Tendency to A 30 20 10 0 0,08% 0,13% 0,18% 0,23% 0,28% 0,33% 0,37% 0,42% 0,47% 0,52% 0,57% 0,62% 0,66% 0,71% 0,76% 0,81% 0,86% 0,91% 0,96% 1,00% 1,05% 1,10% 1,15% 1,20% 1,25% 1,30% Mean and standard deviation with decay factor

Prototype system Time series used to build the scale Built ratings scale New input Rating and tendency Rating system Series within the selected degree Histograms for rating&tendency New input mean

5. Time Series Prediction Predicted future Prediction Time series Predicted time series

Time series prediction Goal prediction of a given time series for a given time horizon by analyzing the series historical development Input A time series Setting according to the used approach (for example learning iterations, time window size, etc.) Output The given time series with additional predicted values Confidence bounds Prediction quality statistics Using The predicted values can be used as the most probable future values, for instance in algorithmic trading

Time series prediction The most commonly used prediction methods are: Averages (MA, WMA, EWMA, etc.) Autoregressive methods (AR, ARMA, ARIMA, SARIMA, ARMAX, SETAR, etc.) with Box-Jenkins methodology Trend-extrapolation (based on LSE, trend polynomial finding, etc.) Neural Networks (MLP, RBF, SOM, ART, recurrent Elman/Jordan networks, etc.), Neural Network are used in current approach Other regression based (e.g. Observers) and econometric models Kalman, Wiener and other filters Wavelet based methods Holt-Winter decomposition Hybrid approaches The prediction could be used for technical analysis Confidence bounds are used Predictability indicators can be suggested (Hurst exponent, etc.)

Prediction by neural network Model identification Historical values Input vector Output vector Neural Network Target function Sliding window Optimization Prediction Recursive prediction Horizon Neural Network

Prediction by neural network Modeling process Data pre-processing Modeling of NN architecture Training Application of NN model Evaluation Manual by trying and error approach Preprocessing Post processing Prediction with confidence bounds 0,0008 0,0008 0,0007 0,0007 0,0007 0,0007 0,0007 0,0006 0,0006 0,0006 0,0006 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Neural network 0,0008 0,0008 0,0007 0,0007 0,0007 0,0007 0,0007 0,0006 0,0006 0,0006 0,0006 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Prediction The prediction generally includes data pre-processing, solving of matrix equations (batch or iteratively) and data post-processing Historical values Horizon

Prototype system Preprocessing Prediction methods Values Time horizon Test Error graphic Confidence bounds

Modules dependencies Series Calculations processing Neural Networks distributions and parameters estimation 1.Clustering 2. Non-normal distributions sating scale building histograms 4. Implied Ratings formula building factors selection 3. Multifactor Models learning & prediction 5. Prediction Time series