Finding the best distribution that explains your data ENMAX Energy Corporation 8 October, 2015
Introduction Introduction Statistical tests Goodness of fit We often fit observations to a model (e.g., lognormal distribution). How can we ensure that the model is appropriate? Is there a model that would provide more accurate predictions? Goodness of fit Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Wikipedia
Kolmogorov-Smirnov Introduction Statistical tests Goodness of fit The K-S statistic, D, is defined as: D n = sup F n (x) F (x) x for the hypothesized distribution is F, and empirical (sample) cumulative distribution function is F n.
Anderson-Darling Introduction Statistical tests Goodness of fit There are many fit tests - they are mostly variations of the KS test. For example, the AD statistic, A, is defined as: A = n (F n (x) F (x)) 2 df (x) F (x) (1 F (x)) and is a weighted sum of the quadratic difference between the hypothesized distribution and the sample one, placing more weight on observations in the tails.
P value statistics Introduction Statistical tests Goodness of fit The P value is the answer to this question: If the two samples were randomly sampled from identical populations, what is the probability that the two cumulative frequency distributions would be as far apart as observed? More precisely, what is the chance that the value of the test statistic would be as large or larger than observed? If the P value is small, conclude that the two groups were sampled from populations with different distributions. The populations may differ in median, variability or the shape of the distribution.
Electricity prices in Alberta Context Coal plant outages Distribution fitting One of the most volatile commodities traded in wholesale markets.
Sources of volatility Context Coal plant outages Distribution fitting An increasing portion of the supply portfolio is stochastic: Alberta has an installed wind capacity of 8.3% Coal-fired power plants undergo forced outages
Case study - Sundance 2 Context Coal plant outages Distribution fitting Sundance A & B Power Purchase Agreement Power Purchase Arrangement Highlights Sundance A PPA: 100 per cent of the output from units 1 & 2 = 560 MW. Term expires in 2017. Sundance B PPA: 50 per cent of the output from Units 3 & 4 = 353 MW. Term expires in 2020. Location: The plant is located 70 kilometres (about 45 miles) west of Edmonton, Alberta on the south shore of Lake Wabamun. In-Service Date: Unit 1-1970; Unit 2-1973; Unit 3-1976; Unit 4-1977. Capacity: 2,029 MW. Fuel: Coal from TransAlta s Highvale mine. Environmental Features: Meets ISO 14001 standards; Regulated by Alberta Environment and the Alberta Electric Utilities Board. Owner: TransCanada s Sundance A and B Power Purchase Arrangements entitle TransCanada to more than 900 megawatts (MW) of capacity from the Sundance Power Plant. TransCanada sells this electricity under long-term contracts and into the spot market. The Sundance Power Plant has a total of six generating units and is owned and operated by TransAlta. TransAlta Utilities Corporation. Operator: TransAlta Utilities Corporation.
Outage statistics Context Coal plant outages Distribution fitting The Sundance 2 unit has undergone several forced outages in 2015 - often coinciding with wholesale market price spikes.
Distribution fitting Context Coal plant outages Distribution fitting PROC UNIVARIATE DATA = WORK.ON ; VAR ON ; HISTOGRAM ON / NORMAL LOGNORMAL EXP WEIBULL ; CDFPLOT ON / WEIBULL ; RUN ;
Context Coal plant outages Distribution fitting Distribution statistics and goodness of fit Both fits provide an accurate description of the observed data. There may be a more theoretical reason to choose the Weibull distribution.
Finding the right predictive model is important There are several tests that can quantify how well a certain model fits empirical data Using these tests, we can obtain GOF statistics Build more reliable models using the right fit
Bibliography 1 Base SAS(R) 9.2 Procedures Guide: Statistical Procedures. UNIVARIATE Procedure, Goodness-of-Fit Tests. 2 Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2007). Power-Law Distributions in Empirical Data. SIAM Review, 51, 661-703. 3 Hagiwara, Y. (1974). Probability of earthquake occurrence as obtained from a Weibull distribution analysis of crustal strain. Tectonophysics, 23, 313-318.