Java Modules for Time Series Analysis


 Catherine Williams
 3 years ago
 Views:
Transcription
1 Java Modules for Time Series Analysis
2 Agenda Clustering Nonnormal distributions Multifactor modeling Implied ratings Time series prediction
3 1. Clustering + Cluster 1 Synthetic Clustering + Time series Cluster 2 Synthetic + Cluster 3 Synthetic
4 Clustering Goal grouping of time series in such a way that the series with similar historical behavior to be in the same group Input A set of single time series (bond, share, fund prices) or time series groups (for example interest rate market curves) Number of clusters Output Clusters of time series, Clustering quality statistics Every cluster is represented by a prototype series (synthetic curve) with the same dimensionality as the all other series Using Clustering can be used to reduce a huge number of series and thus to facilitate and make feasible time consuming operation like calculation of huge correlation matrices, etc. The number of time series is reduced by: Identifying the cluster in which a series belongs to Using of the prototype of the cluster instead of the real series Determine similar behavior of market factors or Issuers (Cartels)
5 Clustering Clustering can be performed for: Time series (for example shares or bond prices having historical development) Curve time series (for example interest rate market curves having historical development) In addition to the clusters with their series and prototypes clustering quality statistics are generated: Inter and intra cluster statistics, adjuster R squared, average linkage, etc. Some of these statistics can be used to determine the optimal number of clusters, i.e. the best number of groups of min internal distance and max distance to each other
6 Error Finding optimal number of clusters using clustering error Num Clusters Adjusted R squared Error 2 0, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,20 0,15 0,10 0,05 0, Optimal number of clusters = 18 Number of clusters Optimal number of clusters = 18
7 Example: clustering of 20 spread curves Actual Spread Curves MM CM CM CM CM CM Num Maturity(Years) 0, FORTUMAEURMM 0,1439% 0,2293% 0,3047% 0,3686% 0,4647% 0,5828% 2 SRBIAAEURCR 1,2863% 1,2500% 1,7880% 2,1133% 2,6375% 3,0123% 3 UKRAINAUSDCR 1,7606% 1,9218% 2,7834% 3,2418% 3,8589% 4,5407% 4 ITALYAEURCR 0,1022% 0,1167% 0,1896% 0,2727% 0,3965% 0,4980% 5 SLOVENAEURCR 0,0376% 0,1044% 0,1351% 0,1584% 0,2338% 0,3622% 6 CZECHAEURCR 0,1295% 0,1471% 0,2220% 0,2583% 0,3627% 0,4925% 7 TURKEYAUSDCR 0,8137% 0,9853% 1,6326% 2,1247% 2,7918% 3,4798% 8 ROMANIAUSDCR 0,5478% 0,5594% 1,1125% 1,3618% 1,7735% 2,0718% 9 POLANDAUSDCR 0,0883% 0,1608% 0,2432% 0,3576% 0,5238% 0,6876% 10 PEUGOTAEURMM 0,6865% 0,8433% 1,0406% 1,2925% 1,6265% 1,7360% 11 JPMCUSDMR 1,0987% 1,0272% 1,1421% 1,2384% 1,3570% 1,3909% 12 DANBNKAEURMM 0,1830% 0,2635% 0,3686% 0,4481% 0,5610% 0,6098% 13 GSAUSDMR 1,1586% 1,1238% 1,1868% 1,2137% 1,2638% 1,2969% 14 CROATIAEURCR 0,1554% 0,2741% 0,4423% 0,5602% 0,8274% 1,0806% 15 ESPSANCEURMM 0,7725% 0,9577% 1,2941% 1,5505% 1,9120% 1,9371% 16 SUEDZUAEURMM 0,6148% 0,7388% 0,9829% 1,2075% 1,5032% 1,6203% 17 LEHAUSDMR 6,7011% 6,8482% 5,4481% 4,5185% 3,2897% 2,6982% 18 HELLASCEURMM 4,6793% 4,9875% 8,3839% 9,9521% 10,6632% 10,6198% 19 REPHUNAUSDCR 0,2793% 0,3050% 0,5168% 0,7809% 1,1361% 1,3992% 20 BGARIAAUSDCR 0,3123% 0,4825% 0,8722% 1,1222% 1,5268% 1,8284%
8 All 20 spread curves 12,00% 10,00% 8,00% 6,00% 4,00% 2,00% 0,00% Actual Spread Curves on , FORTUMAEURMM SRBIAAEURCR UKRAINAUSDCR ITALYAEURCR SLOVENAEURCR CZECHAEURCR TURKEYAUSDCR ROMANIAUSDCR POLANDAUSDCR PEUGOTAEURMM JPMCUSDMR DANBNKAEURMM GSAUSDMR CROATIAEURCR ESPSANCEURMM SUEDZUAEURMM LEHAUSDMR HELLASCEURMM REPHUNAUSDCR BGARIAAUSDCR
9 Clusters of spread curves Spread Curve Clusters  Actual Rates MM CM CM CM CM CM Std Deviation Num Maturity(Years) 0, Cluster 1 0,1725% 1 FORTUMAEURMM 0,1439% 0,2293% 0,3047% 0,3686% 0,4647% 0,5828% 4 ITALYAEURCR 0,1022% 0,1167% 0,1896% 0,2727% 0,3965% 0,4980% 5 SLOVENAEURCR 0,0376% 0,1044% 0,1351% 0,1584% 0,2338% 0,3622% 6 CZECHAEURCR 0,1295% 0,1471% 0,2220% 0,2583% 0,3627% 0,4925% 9 POLANDAUSDCR 0,0883% 0,1608% 0,2432% 0,3576% 0,5238% 0,6876% 12 DANBNKAEURMM 0,1830% 0,2635% 0,3686% 0,4481% 0,5610% 0,6098% 14 CROATIAEURCR 0,1554% 0,2741% 0,4423% 0,5602% 0,8274% 1,0806% Cluster Spread 0,1200% 0,1851% 0,2722% 0,3463% 0,4814% 0,6162% Cluster 2 0,3019% 8 ROMANIAUSDCR 0,5478% 0,5594% 1,1125% 1,3618% 1,7735% 2,0718% 13 GSAUSDMR 1,1586% 1,1238% 1,1868% 1,2137% 1,2638% 1,2969% 10 PEUGOTAEURMM 0,6865% 0,8433% 1,0406% 1,2925% 1,6265% 1,7360% 11 JPMCUSDMR 1,0987% 1,0272% 1,1421% 1,2384% 1,3570% 1,3909% 15 ESPSANCEURMM 0,7725% 0,9577% 1,2941% 1,5505% 1,9120% 1,9371% 16 SUEDZUAEURMM 0,6148% 0,7388% 0,9829% 1,2075% 1,5032% 1,6203% 19 REPHUNAUSDCR 0,2793% 0,3050% 0,5168% 0,7809% 1,1361% 1,3992% 20 BGARIAAUSDCR 0,3123% 0,4825% 0,8722% 1,1222% 1,5268% 1,8284% Cluster Spread 0,6838% 0,7547% 1,0185% 1,2209% 1,5123% 1,6601% Cluster 3 0,8026% 2 SRBIAAEURCR 1,2863% 1,2500% 1,7880% 2,1133% 2,6375% 3,0123% 3 UKRAINAUSDCR 1,7606% 1,9218% 2,7834% 3,2418% 3,8589% 4,5407% 7 TURKEYAUSDCR 0,8137% 0,9853% 1,6326% 2,1247% 2,7918% 3,4798% 17 LEHAUSDMR 6,7011% 6,8482% 5,4481% 4,5185% 3,2897% 2,6982% Cluster Spread 2,6402% 2,7512% 2,9129% 2,9995% 3,1445% 3,4328% Cluster 4 0,0000% 18 HELLASCEURMM 4,6793% 4,9875% 8,3839% 9,9521% 10,6632% 10,6198% Cluster Spread 4,6793% 4,9875% 8,3839% 9,9521% 10,6632% 10,6198%
10 Clusters of market curves Cluster ,2000% 1,0000% FORTUMAEURMM ITALYAEURCR 0,8000% 0,6000% 0,4000% SLOVENAEURCR CZECHAEURCR POLANDAUSDCR DANBNKAEURMM Synthetic curve 0,2000% 0,0000% CROATIAEURCR Cluster Spread Cluster ,5000% ROMANIAUSDCR Synthetic curve 2,0000% 1,5000% 1,0000% 0,5000% GSAUSDMR PEUGOTAEURMM JPMCUSDMR ESPSANCEURMM SUEDZUAEURMM REPHUNAUSDCR 0,0000% BGARIAAUSDCR Cluster Spread
11 Historical development Cluster 2: 6 Months Cluster 2: 1 Year 3,00% 3,00% 2,50% ROMANIAUSDCRMM 2,50% ROMANIAUSDCRCM GSAUSDMRMM GSAUSDMRCM 2,00% PEUGOTAEURMMMM 2,00% PEUGOTAEURMMCM JPMCUSDMRMM JPMCUSDMRCM 1,50% ESPSANCEURMMMM 1,50% ESPSANCEURMMCM SUEDZUAEURMMMM SUEDZUAEURMMCM 1,00% REPHUNAUSDCRMM 1,00% REPHUNAUSDCRCM BGARIAAUSDCRMM BGARIAAUSDCRCM 0,50% Cluster 0,50% Cluster 0,00% Cluster 2: 5 Years 0,00% Cluster 2: 10 Years Synthetic curve 4,00% 4,00% 3,50% ROMANIAUSDCRCM 3,50% ROMANIAUSDCRCM 3,00% 2,50% GSAUSDMRCM PEUGOTAEURMMCM JPMCUSDMRCM 3,00% 2,50% GSAUSDMRCM PEUGOTAEURMMCM JPMCUSDMRCM 2,00% ESPSANCEURMMCM 2,00% ESPSANCEURMMCM 1,50% 1,00% 0,50% SUEDZUAEURMMCM REPHUNAUSDCRCM BGARIAAUSDCRCM Cluster 1,50% 1,00% 0,50% SUEDZUAEURMMCM REPHUNAUSDCRCM BGARIAAUSDCRCM Cluster 0,00% 0,00%
12 2. NonNormal Distributions Theoretical distribution type + parameters Nonnormal distributions Cauchy Empirical distribution Normal
13 Nonnormal distributions Goal automatically identification of distribution type and its parameters using market time series and use the Copula approach to simulate market factors in Monte Carlo VaR using mapped distributions Input The time series of the market factors Chosen standard distribution types (Beta, Cauchy, Student, Weibull, etc.) Output Identified distribution type The parameters of the identified distribution type Numerical estimation of the distance between the empirical distribution and all other distribution types (allows to order distribution types and choose other good fitting distribution type) Using Improving Monte Carlo VaR simulation by using of correlated nonnormal distribution samples instead of correlated normal distribution samples
14 Nonnormal distributions Calculation of Value at Risk Q Confidence level a quartile Market VaR(a) Expected value The distribution of time series for market factors is assumed to be normal in the most cases. But this don t correspond to reality, the time series expose often skewed and flat tail distributions which is connected to underestimation of market risk for improbable large loses (flat tail losses)
15 Nonnormal distributions Mapping Risk Factors to best fit Distribution The best fit is given by the Cauchy Distribution (green) Normal Distribution The Beta Distribution will produce larger confidence risk because of the flat tail
16 Distribution parameters estimation The main important goal is to achieve best modeling of empirical distribution shape by reproducible theoretical distribution shape Together with the distribution type identification, the distribution parameters are also estimated from market data using the method of moments, least squares regression or maximum likelihood. The additional parameters shift and scale are also used to avoid distribution parameters values in undefined regions Data having a given distribution can be generated by: Distribution type Distribution parameters Additional parameters (shift, scale) Values count Cumulative distributions are used for the subsequent Copula Monte Carlo Simulation
17 Standard distribution parameters estimation 10 distribution types Distribution parameters Additional parameters Distribution Parameter 1 Parameter 2 Parameter 1 Parameter 2 Beta Shape Shape Shift Scale Cauchy Location Scale Exponential Rate  Shift  Inverse Normal Mu Lambda Shift  Log Normal Log Scale Shape Shift  Normal Mean Variance Shift  Pareto Scale Shape Rayleigh Sigma  Shift  Student Nu  Shift Scale Weibull Scale Shape Shift 
18 Distribution mapping Two metrics are used to compare distributions: Histogram metric empirical histogram bins frequencies are compared against theoretical histogram bins probabilities Cumulative distances metric ignoring Xaxis values, cumulative distances between market series data points are calculated. The same function is calculated using theoretically generated values for the distribution under consideration. These two cumulative values are compared. Both histogram and cumulative distances are compared using average squared error
19 Histogram metric Distances between theoretical and empirical histograms Theoretical histogram Empirical histogram Best (mapped) distribution is identified by the minimum sum of squared distances between the distribution theoretical histogram and empirical histogram min max
20 Cumulative distances metric Data values y Cumulative distances between values Cumulative distances graph p i p 1 = d 1 d 2 p 2 = d 1 + d 2 p 3 = d 1 + d 2 + d 3 d 1 i Best (mapped) distribution is identified by the minimum sum of squared distances between the empirical cumulative values and corresponding theoretical cumulative values
21 Copula Monte Carlo VaR Example for 2 Market Factors (Lognormal and Beta distributed) Market Risk Correlation Matrix Normal distributed correlated random samples Cumulative Distribution Lognormal Distribution x = F 1 (y) Equally distributed and Correlated random samples (0...1) Cumulative Distribution Monte Carlo Simulation VaR Distribution Skewed Distribution Beta Distribution Correlated nonnormal distributed samples are put to Monte Carlo simulation instead of correlated normal distributed samples generated using the market risk correlation matrix
22 Copula Monte Carlo VaR Skewed and flat tail VaR distributions Skewed VaR distribution Flat tail VaR distribution
23 Prototype system Theoretical histograms Empirical histogram Cumulative values Parameters estimations Distributions generator Distances between theoretical and empirical distributions Best Fit for Weibul Distribution
24 3. Multifactor models Formula Target factor Multifactor Models Target factor = Coefficients Explanatory factors Functions 0, Instruments_FundFR , Instruments_FundLU sqrt 11, StockIndexCurve_DJIA ln 0, StockIndexCurve_GEX 0, StockIndexCurve_NasdaqComposite ^2.0 0, StockIndexCurve_Nikkei225 sqrt 0, StockIndexCurve_SDAXPI sqrt 10, StockIndexCurve_TECDAXPI ln 6739,26524 Target factor Obtained by formula Explanatory factors Time series
25 Multifactor models Goal building formulas describing unknown market instruments by instruments with known pricing models based on time series Input The historical time series of the target factor (the instrument with unknown pricing approach or unknown market factor dependency) Other available historical time series to be used as explanatory factors (indices, spread curves, interest rate, inter banking rate, foreign exchange rate, etc.) Output Polynomial like formula describing the dependency of the target factor by the explanatory factors Using The generated formula can be used to develop a new type instrument having a pricing approach based on a set of known factors Obtain a factor contribution to instrument price development and risk
26 Multifactor models object Available market factors Target instrument Formula building Formula calibration Time Target instrument time series Target instrument by formula Explanatory factor time series Other factors The target instrument is calculated by formula The formula is built and periodically calibrated using target instrument and explanatory instrument time series
27 Stages of modeling Start Target factor selection Explanatory factors suggestion/selection Basis functions combination determination Regression coefficients determination Final formula determination and error calculation End  all given factors in the system  determined by system and/or human  determined by system
28 Explanatory factors selection When a target factor is selected explanatory factors should be selected by automatic suggestion and/or hand choosing Automatic suggestion could be done by: Clustering Explanatory factors are obtained from the cluster in which the target factor is classified. If the number of explanatory factors determined in this way are insufficient then the number of clusters could be decreased in order to increase the number of elements in the cluster Minimal covariances between candidate factors Covariances between all factors are calculated and the first n minimal covariances determine the factors Maximal covariances between candidate factors and the target factor
29 Formula builder After the target and explanatory factors are selected formula building process should be started in which the system performs: Finding of combination of basis functions to the explanatory factors The basis functions are used to: improve the accuracy avoid linear dependencies between factors in that causes matrices equations problems Regression coefficients β i (Beta Factors) y = β 1 f 1 (x 1 ) + β 2 f 2 (x 2 ) β n f m (x n ) + β n+1 + ε y target factor x 1, x 2,, x n explanatory factors β 1, β 2,, β n, β n+1 regression coefficients f 1, f 2,, f m basis functions ε error
30 Basis functions combination Basis functions f 1 f 2 f 3 f 4 f m Function exponent logarithm sine cosine htangent Explanatory factors x 1 x 2 x 3 x n Name GOV Bel FX USD Oil price. Gold price Date 1 Date 2 Date t Date K Combination of basis functions applied to the explanatory factors Target factor y ỹ GOV Aut GOV Aut estimation ε = (y  ỹ) 2 Distance ỹ =β 1 f 3 (x 1 ) + β 2 f 1 (x 2 ) β n f 2 (x n ) + β n+1 + ε
31 Prototype system Generated formula Graphic results: Target and MultiFactor
32 Prototype system settings
33 4. Implied Rating Scale building Time series Implied Ratings Classification Rating BB Tendency BBB
34 Implied Rating (Basel III) Goal building of a rating scale based on explicit CDS time series and using it to identify both the implied rating and the tendency of a new CDS input series Input A set of CDS time series that relate to assets or issuers (CDS spread curves or indices, bond prices, share prices, etc.) Rating system  number and symbols for the ratings of the rating scale Output Scale with boundaries between the ratings Using By supplying the built scale with a new time series representing an issuer, the system identifies: Current rating based on the historical development giving more importance to the last values Tendency what is the next probable rating
35 Steps to obtain implied rating Establishment of the rating scale Available time series are used to build given number of rating degrees and to determine their boundaries The time series are distributed into given rating degrees according to the historical behavior The center of every degree is determined using the all time series which belong to the degree The boundaries are derived from adjacent centers using equally distanced series A new time series is classified to a rating class by comparing with the centers (that are also time series) of the scale classes and finding the closest one The tendency is determined by Finding the second closest center of rating degrees Finding the closes boundary of the classified rating level
36 Rating degrees boundaries The time series of the rating degrees may overlap АА АА  A A The points of the ratings boundaries are calculated as average values of the corresponding points of the centers of the series in every rating degree The center of the series in a given degree resides not in the middle of the degree boundaries because in the most cases the time series is nonuniformly distributed
37 Rating degrees boundaries 1,80% Boundary Degree center The boundary resides in the mid of the series centers The center of the series resides not in the mid of the boundaries 1,40% 1,00% 0,60% 0,20% A AA
38 Weighting the historical values Weighting of the series values (EWMA by Decay Factor) is applied in order to make more important more actual date values 1,60% 1,40% 1,20% 1,00% 0,80% The last series values reside within the degree boundaries 0,60% 0,40% AA 0,20% 0,00%
39 Determine the rating of a new series In the classification phase histograms are build for the distributions of the data within the best and second best degrees (corresponding to the rating and tendency ) The histograms are shown with the centers of the class and the mean of the new classified series Mean and standard deviation used to build the histograms are calculated taking into account of the same decay factor used to build the ratings scale
40 02,08,,,, 16,08,,,, 30,08,,,, 13,09,,,, 27,09,,,, 11,10,,,, 25,10,,,, 10,11,,,, 24,11,,,, 09,12,,,, 23,12,,,, 10,01,,,, 24,01,,,, 07,02,,,, 21,02,,,, 07,03,,,, 21,03,,,, 04,04,,,, 18,04,,,, 03,05,,,, 17,05,,,, 31,05,,,, 16,06,,,, 01,07,,,, 15,07,,,, 29,07,,,, Determine the rating of a new series 1,60% 1,40% 1,20% 1,00% 0,80% 0,60% 0,40% 0,20% Classification of a new series  Barklays Bank PLC New time series (yellow) that should be classified A AA Mean of the 60 new series, 50 Rating AA and 40 Tendency to A ,08% 0,13% 0,18% 0,23% 0,28% 0,33% 0,37% 0,42% 0,47% 0,52% 0,57% 0,62% 0,66% 0,71% 0,76% 0,81% 0,86% 0,91% 0,96% 1,00% 1,05% 1,10% 1,15% 1,20% 1,25% 1,30% Mean and standard deviation with decay factor
41 Prototype system Time series used to build the scale Built ratings scale New input Rating and tendency Rating system Series within the selected degree Histograms for rating&tendency New input mean
42 5. Time Series Prediction Predicted future Prediction Time series Predicted time series
43 Time series prediction Goal prediction of a given time series for a given time horizon by analyzing the series historical development Input A time series Setting according to the used approach (for example learning iterations, time window size, etc.) Output The given time series with additional predicted values Confidence bounds Prediction quality statistics Using The predicted values can be used as the most probable future values, for instance in algorithmic trading
44 Time series prediction The most commonly used prediction methods are: Averages (MA, WMA, EWMA, etc.) Autoregressive methods (AR, ARMA, ARIMA, SARIMA, ARMAX, SETAR, etc.) with BoxJenkins methodology Trendextrapolation (based on LSE, trend polynomial finding, etc.) Neural Networks (MLP, RBF, SOM, ART, recurrent Elman/Jordan networks, etc.), Neural Network are used in current approach Other regression based (e.g. Observers) and econometric models Kalman, Wiener and other filters Wavelet based methods HoltWinter decomposition Hybrid approaches The prediction could be used for technical analysis Confidence bounds are used Predictability indicators can be suggested (Hurst exponent, etc.)
45 Prediction by neural network Model identification Historical values Input vector Output vector Neural Network Target function Sliding window Optimization Prediction Recursive prediction Horizon Neural Network
46 Prediction by neural network Modeling process Data preprocessing Modeling of NN architecture Training Application of NN model Evaluation Manual by trying and error approach Preprocessing Post processing Prediction with confidence bounds 0,0008 0,0008 0,0007 0,0007 0,0007 0,0007 0,0007 0,0006 0,0006 0,0006 0, Neural network 0,0008 0,0008 0,0007 0,0007 0,0007 0,0007 0,0007 0,0006 0,0006 0,0006 0, Prediction The prediction generally includes data preprocessing, solving of matrix equations (batch or iteratively) and data postprocessing Historical values Horizon
47 Prototype system Preprocessing Prediction methods Values Time horizon Test Error graphic Confidence bounds
48 Modules dependencies Series Calculations processing Neural Networks distributions and parameters estimation 1.Clustering 2. Nonnormal distributions sating scale building histograms 4. Implied Ratings formula building factors selection 3. Multifactor Models learning & prediction 5. Prediction Time series
Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV
Contents List of Figures List of Tables List of Examples Foreword Preface to Volume IV xiii xvi xxi xxv xxix IV.1 Value at Risk and Other Risk Metrics 1 IV.1.1 Introduction 1 IV.1.2 An Overview of Market
More informationVolatility modeling in financial markets
Volatility modeling in financial markets Master Thesis Sergiy Ladokhin Supervisors: Dr. Sandjai Bhulai, VU University Amsterdam Brian Doelkahar, Fortis Bank Nederland VU University Amsterdam Faculty of
More informationMATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!
MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Prealgebra Algebra Precalculus Calculus Statistics
More informationMultiple Choice: 2 points each
MID TERM MSF 503 Modeling 1 Name: Answers go here! NEATNESS COUNTS!!! Multiple Choice: 2 points each 1. In Excel, the VLOOKUP function does what? Searches the first row of a range of cells, and then returns
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationPractical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods
Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement
More informationData Preparation and Statistical Displays
Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationMaximum Likelihood Estimation of an ARMA(p,q) Model
Maximum Likelihood Estimation of an ARMA(p,q) Model Constantino Hevia The World Bank. DECRG. October 8 This note describes the Matlab function arma_mle.m that computes the maximum likelihood estimates
More informationChapter 2: Systems of Linear Equations and Matrices:
At the end of the lesson, you should be able to: Chapter 2: Systems of Linear Equations and Matrices: 2.1: Solutions of Linear Systems by the Echelon Method Define linear systems, unique solution, inconsistent,
More informationState Space Time Series Analysis
State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State
More informationADVANCED FORECASTING MODELS USING SAS SOFTWARE
ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting
More informationRisk Analysis Using Monte Carlo Simulation
Risk Analysis Using Monte Carlo Simulation Here we present a simple hypothetical budgeting problem for a business startup to demonstrate the key elements of Monte Carlo simulation. This table shows the
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationNEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationBasel II: Operational Risk Implementation based on Risk Framework
Systems Ltd General Kiselov 31 BG9002 Varna Tel. +359 52 612 367 Fax +359 52 612 371 email office@eurorisksystems.com WEB: www.eurorisksystems.com Basel II: Operational Risk Implementation based on Risk
More information1 The Pareto Distribution
Estimating the Parameters of a Pareto Distribution Introducing a Quantile Regression Method Joseph Lee Petersen Introduction. A broad approach to using correlation coefficients for parameter estimation
More informationTailDependence an Essential Factor for Correctly Measuring the Benefits of Diversification
TailDependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon
More informationCurrent Standard: Mathematical Concepts and Applications Shape, Space, and Measurement Primary
Shape, Space, and Measurement Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two and threedimensional shapes by demonstrating an understanding of:
More informationAP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationA frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes
A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More information2. Descriptive statistics in EViews
2. Descriptive statistics in EViews Features of EViews: Data processing (importing, editing, handling, exporting data) Basic statistical tools (descriptive statistics, inference, graphical tools) Regression
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulationbased method for estimating the parameters of economic models. Its
More informationAlgebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationEngineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLean Six Sigma Training/Certification Book: Volume 1
Lean Six Sigma Training/Certification Book: Volume 1 Six Sigma Quality: Concepts & Cases Volume I (Statistical Tools in Six Sigma DMAIC process with MINITAB Applications Chapter 1 Introduction to Six Sigma,
More informationMTH304: Honors Algebra II
MTH304: Honors Algebra II This course builds upon algebraic concepts covered in Algebra. Students extend their knowledge and understanding by solving openended problems and thinking critically. Topics
More informationRegression III: Advanced Methods
Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming
More informationMATLAB for Use in Finance Portfolio Optimization (Mean Variance, CVaR & MAD) Market, Credit, Counterparty Risk Analysis and beyond
MATLAB for Use in Finance Portfolio Optimization (Mean Variance, CVaR & MAD) Market, Credit, Counterparty Risk Analysis and beyond Marshall Alphonso Marshall.Alphonso@mathworks.com Senior Application Engineer
More informationUsing Duration Times Spread to Forecast Credit Risk
Using Duration Times Spread to Forecast Credit Risk European Bond Commission / VBA Patrick Houweling, PhD Head of Quantitative Credits Research Robeco Asset Management Quantitative Strategies Forecasting
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationNon Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear
More informationUnivariate and Multivariate Methods PEARSON. Addison Wesley
Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston
More informationReport on application of Probability in Risk Analysis in Oil and Gas Industry
Report on application of Probability in Risk Analysis in Oil and Gas Industry Abstract Risk Analysis in Oil and Gas Industry Global demand for energy is rising around the world. Meanwhile, managing oil
More informationData Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data
Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.datamines.com Louise.francis@datamines.cm
More informationFrequency distributions, central tendency & variability. Displaying data
Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the
More informationHedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies
Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies Drazen Pesjak Supervised by A.A. Tsvetkov 1, D. Posthuma 2 and S.A. Borovkova 3 MSc. Thesis Finance HONOURS TRACK Quantitative
More informationStatistical Analysis of Life Insurance Policy Termination and Survivorship
Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial
More informationPrecalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES
Content Expectations for Precalculus Michigan Precalculus 2011 REVERSE CORRELATION CHAPTER/LESSON TITLES Chapter 0 Preparing for Precalculus 01 Sets There are no statemandated Precalculus 02 Operations
More informationIBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
More informationModule 4: Data Exploration
Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive
More informationVISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA
VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density
More informationCost of Capital and Corporate Refinancing Strategy: Optimization of Costs and Risks *
Cost of Capital and Corporate Refinancing Strategy: Optimization of Costs and Risks * Garritt Conover Abstract This paper investigates the effects of a firm s refinancing policies on its cost of capital.
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationA comparison between different volatility models. Daniel Amsköld
A comparison between different volatility models Daniel Amsköld 211 6 14 I II Abstract The main purpose of this master thesis is to evaluate and compare different volatility models. The evaluation is based
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationA model to predict client s phone calls to Iberdrola Call Centre
A model to predict client s phone calls to Iberdrola Call Centre Participants: Cazallas Piqueras, Rosa Gil Franco, Dolores M Gouveia de Miranda, Vinicius Herrera de la Cruz, Jorge Inoñan Valdera, Danny
More informationThe Big 50 Revision Guidelines for S1
The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand
More informationseven Statistical Analysis with Excel chapter OVERVIEW CHAPTER
seven Statistical Analysis with Excel CHAPTER chapter OVERVIEW 7.1 Introduction 7.2 Understanding Data 7.3 Relationships in Data 7.4 Distributions 7.5 Summary 7.6 Exercises 147 148 CHAPTER 7 Statistical
More informationBig Ideas in Mathematics
Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationThis unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
More informationMaster of Mathematical Finance: Course Descriptions
Master of Mathematical Finance: Course Descriptions CS 522 Data Mining Computer Science This course provides continued exploration of data mining algorithms. More sophisticated algorithms such as support
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationDr Christine Brown University of Melbourne
Enhancing Risk Management and Governance in the Region s Banking System to Implement Basel II and to Meet Contemporary Risks and Challenges Arising from the Global Banking System Training Program ~ 8 12
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationSan Jose State University Engineering 10 1
KY San Jose State University Engineering 10 1 Select Insert from the main menu Plotting in Excel Select All Chart Types San Jose State University Engineering 10 2 Definition: A chart that consists of multiple
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationAlgebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
More information2. Filling Data Gaps, Data validation & Descriptive Statistics
2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)
More informationThinkwell s Homeschool Algebra 2 Course Lesson Plan: 34 weeks
Thinkwell s Homeschool Algebra 2 Course Lesson Plan: 34 weeks Welcome to Thinkwell s Homeschool Algebra 2! We re thrilled that you ve decided to make us part of your homeschool curriculum. This lesson
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationPHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS
PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationA LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São
More informationExecutive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03
Executive Program in Managing Business Decisions: A Quantitative Approach ( EPMBD) Batch 03 Calcutta Ver 1.0 Contents Broad Contours Who Should Attend Unique Features of Program Program Modules Detailed
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationOverview of Math Standards
Algebra 2 Welcome to math curriculum design maps for Manhattan Ogden USD 383, striving to produce learners who are: Effective Communicators who clearly express ideas and effectively communicate with diverse
More informationTime Series Analysis
Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina GarcíaMartos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and GarcíaMartos
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationSimulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes
Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simcha Pollack, Ph.D. St. John s University Tobin College of Business Queens, NY, 11439 pollacks@stjohns.edu
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationExpression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds
Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative
More informationData Analysis: Describing Data  Descriptive Statistics
WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationAlgebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
More informationData Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010
Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationSoftware Review: ITSM 2000 Professional Version 6.0.
Lee, J. & Strazicich, M.C. (2002). Software Review: ITSM 2000 Professional Version 6.0. International Journal of Forecasting, 18(3): 455459 (June 2002). Published by Elsevier (ISSN: 01692070). http://0
More information