Buered Probability of Exceedance: Mathematical Properties and Optimization Algorithms

Transcription

1 Buered Probability of Exceedance: Mathematical Properties and Optimization Algorithms Alexander Mafusalov, Stan Uryasev RESEARCH REPORT Risk Management and Financial Engineering Lab Department of Industrial and Systems Engineering 303 Weil Hall, University of Florida, Gainesville, FL s: First draft: October 2014, This draft: October 2014 Correspondence should be addressed to: Stan Uryasev Abstract This paper introduces a new probabilistic characteristic called buered probability of exceedance (bpoe). This characteristic is an extension of so-called buered probability of failure and it is equal to one minus superdistribution function. Paper provides ecient calculation formulas for bpoe. bpoe is proved to be a quasi-convex function of random variable w.r.t. the regular addition operation and a concave function w.r.t. the mixture operation; it is a monotonic function of random variable. bpoe is proved to be a strictly decreasing function of the parameter on the interval between the mathematical expectation and the essential supremum. Multiplicative inverse of the bpoe is proved to be a convex function of parameter, and a piecewise-linear function in the case of discretely distributed random variable. Minimization of the bpoe can be reduced to a convex program for a convex feasible region and to LP for a polyhedral feasible region. A family of bpoe minimization problems and family of the corresponding CVaR minimization problems share the same frontier of optimal solutions and optimal values. Keywords: probability of failure, probability of exceedance, buered probability of failure, superdistribution, superquantile, Conditional Value-at-Risk, CVaR, parametric simplex method 1. Introduction This paper uses the notation CVaR α (X) for conditional-value-at-risk (CVaR) for a random variable X and a condence level α [0, 1], explored in [5]. To have a more consice notation, an alternative name superquantile q α (X) is used, similar to a regular quantile q α (X). That is, q α (X) = CVaR α (X). Notation q(α; X) is used to present superquantile q α (X) = q(α; X) as a function of parameter α. For example, q 1 (x; X) should be interpreted as an inverse function of superquantile as a function of α. Probability of exceedance is dened as p x (X) = P (X > x) = 1 F X (x), where F X (x) is a distribution function. In engineering applications it is usual to see an optimization problem with probability of exceedance in constraints or as an objective. Paper [4] suggests as an alternative to the probability of failure, which is p(x) = P (X > 0), the

2 buered probability of failure, which is a value p(x) such that q p(x) (X) = 0. This paper denes buered probability of exceedance p x (X) in a way that p 0 (X) = p(x) and p x (X) = p 0 (X x). To dene buered probability of exceedandce, we introduce the following mathematical notions from paper [3]. For any random variable X with distribution function F X (x) there is an auxilary random variable X = q(f X (X); X) with distribution function F X(x) = F X (x), called superdistribution function, and 1, for x sup X; F X (x) = q 1 (x; X), for EX < x < sup X; 0, otherwise, where q 1 (x; X) is an inverse of the function q(α; X) as a function of α. Denition 1. For a random variable X and x R, buered probability of exceedance is dened as follows 0, for x sup X; p x (X) = 1 F X (x) = 1 q 1 (x; X), for EX < x < sup X; 1, otherwise. Book [9] considers Chebyshev-type family of inequalities with CVaR deviation and shows that the tightest inequality in the family is obtained for α = p x (X), and the tightest inequality itself reduces to p x (X) p x (X). (1) Inequality (1) is similar to q α (X) q α (X). Inequality (1) is one of the motivations for introducing buered probability of exceedance instead of regular probability of exceedance. Paper [4] uses inequality (1) to argue that the buered probability of failure is a conservative estimate of the probability of failure. Similarly, buered probability of exceedance is a conservative estimate of the probability of exceedance. Section 2 proves several formulas for ecient calculation of p x (X). Section 3.1 investigates mathematical properties of p x (X) w.r.t. parameter x. Section 3.2 establishes mathematical properties of p x (X) w.r.t. random variable X. Section 4 studies minimization of p x (X) over a feasible region X X. 2. Calculation Formulas for BPOE Note that, since q α (X) x = q α (X x) for any constant x, see e.g. [6], then X x = X x and F X (x) = F X x (0). Therefore, p x (X) = p 0 (X x). The following proposition is a slightly modied proposition from paper [1], studying applications of buered probability of exceedance in classication. Proposition 1. For a random variable X and x R, buered probability of exceedance equals { 0, if x = sup X; p x (X) = (2) min a 0 E[a(X x) + 1] +, otherwise. 2

3 Proof. In the denition of buered probability of exceedance we have three cases: 1. p x (X) = 1 q 1 (x; X) when EX < x < sup X, 2. p x (X) = 1 when x < EX or x = EX < sup X, 3. p x (X) = 0 when x sup X. Let us prove the proposition case by case. 1. Let EX < x < sup X, and take x = 0. Since q α (X) is a strictly increasing function of α on α [0, 1 P (X = sup X)], then equation q p (X) = 0 has a unique solution p for EX < x < sup X. Then, p 0 (X) = p such that min c c + 1 p E[X c]+ = 0. Since q α (X) is an increasing function of parameter α, then we can reformulate p 0 (X) = min p p such that min c c + 1 p E[X c]+ 0. Therefore, p 0 (X) = min p,c p s.t. c + 1 p E[X c]+ 0. Optimal c < 0, since c c + 1 p E[X c ] + 0, and c = 0 implies sup X 0, which is not the case we consider. Therefore, p 0 (X) = min p,c s.t. p p c c + E [ 1 c X c ] + 0. c Since c < 0, then c c and, therefore, 1 = 1. Further, denoting a =, we have c p 0 (X) = min p,a>0 p s.t. E[aX + 1] + p. p 0 (X) = min a 0 E[aX + 1]+. Note that change a > 0 to a 0 includes value 1 to the feasible region, which does not aect the case considered. Finally, since p x (X) = p 0 (X x), then p x (X) = min a 0 E[a(X x) + 1]+. 2. When EX x, we have E[a(X x) + 1] + ae(x x) Note also that E[a(X x) + 1] + = 1 for a = 0. Therefore, min a 0 E[a(X x) + 1] + = For x = sup X, by the formula, p x (X) = 0. Consider x > sup X, i.e., X x ε < 0. Taking a = 1 ε makes a(x x) 1, therefore, min a 0 E[a(X x) + 1] + = 0. Corollary 1. For EX < x < sup X, p x (X) = 1 q 1 (x; X) = min c<x 3 E[X c] +. (3) x c

4 Furthermore, for x = q α (X), where α (0, 1), it is valid that and, consequently, q α (X) arg min c<x E[X c] +, x c p x (X) = E[X q α(x)] + q α (X) q α (X). Proof. Since EX < x < sup X, then q 1 (x; X) (0, 1), therefore, a = 0 is not optimal for min a 0 E[a(X x)+1] +. Therefore, change of variable a 1 leads to an equivalent x c program: [ ] + 1 min E[a(X x) + a 0 1]+ = min E E[X c] + (X x) + 1 = min. c<x x c c<x x c Note that if x = q α (X), then p x (X) = 1 q 1 (x; X) = 1 α. Since q α (X) = q α (X) α E[X q α] +, then p x (X) = 1 α = E[X q α(x)] + q α (X) q α (X), that is, q α (X) arg min c<x E[X c] + x c. Let X be a discretely distributed random variable with atoms {x i } N i=1, and probabilities {p i } N i=1, where x i x i+1, i = 1,..., N 1, and N is either nite or N =. For condence levels α j = j i=1 pi, where j = 0,..., N, let us denote corresponding superquantiles x j = N i=j+1 xi p i /(1 α j ), with x N = x N for nite N and x N = lim i x i for N =. Then, p x (X) = 1 for x x 0 = EX, p x (X) = 0 for x x N = sup X, and p x j(x) = 1 α j for j = 0,..., N 1. Corollary 2. p x (X) = E[X xj+1 ] + x x j+1 = for x j < x < x j+1, where j = 0,..., N 1. N i=j+1 pi [x i x j+1 ] + x x j+1, (4) Proof. Note that for x j < q α (X) < x j+1 we have α j < α < α j+1, therefore, q α (X) = x j+1. Therefore, formula (4) is implied by Corollary 1. Buered probability is calculated with a simple formula for a set of specic values x j, j = 0,..., N. The following proposition presents a formula for calculation bpoe in intermediate values, i.e., for x such that x j < x < x j+1. Such value x can be also represented as a weighted combination of values x j and x j+1 : x = µ x j + (1 µ) x j+1, for some µ (0, 1). Corollary 3. For µ (0, 1) and j = 0,..., N 1 ( p(µ x j + (1 µ) x j+1 µ ; X) = p( x j ; X) + 1 µ ) 1, p( x j+1 ; X) i.e. 1/ p x (X) is a piecewise-linear function of x. 4

5 Proof. Corollary 2 implies 1 p x (X) = x x j+1 N i=j+1 pi [x i x j+1 ] +, for x j < x < x j+1, where j = 0,..., N 1. Therefore, since p x (X) is continuous for x [EX, sup X), see Proposition 2, then 1/ p x (X) is a piecewise-linear function of x. 3. Mathematical Properties of bpoe 3.1. Properties of bpoe w.r.t. Parameter x Proposition 2. Distribution F X(x) = F X (x) has no more than one atom at sup X = sup X with probability P ( X = sup X) = P (X = sup X). Proof. Note that if for α 1 < α 2 we have q α1 (X) = q α2 (X), then, by denition of CVaR, { min c + 1 } { E[X c] + = min c + 1 } E[X c] +. c 1 α 1 c 1 α 2 For each value of c, if c 0 and c α 1 E[X c] + < c + { } 1 1 α 2 E[X c] +. Therefore, arg min c c α 1 E[X c] + = sup X. It proves that q α (X) as a function of α can have only one interval of constancy, which is for α [1 P (X = sup X), 1]. For the interval α [0, 1 P (X = sup X)] the function q α (X) is strictly increasing in α. This implies that if superdistribution has an atom, then there are two possible locations. The rst case, x = EX, but q 0 (X) = EX, therefore, FX (EX 0) = F X (EX + 0) = 0, and EX is a continuity point of superdistribution. The second case, x = sup X, then lim x sup X 0 FX (x) = 1 P (X = sup X). Since F X (sup X) = 1, and X X, see [3], then sup X = sup X and P ( X = sup X) = P (X = sup X). Corollary 4. For any random variable X, buered probability of exceedance p x (X) is a continuous strictly decreasing function of x at the interval x [EX, sup X). Proof. bpoe equals 1 q 1 (x; X) for x [EX, sup X). The function q(α; X) is strictly increasing continuous for α [0, 1 P (X = sup X)] (see, e.g., proof of Proposition 2). Therefore, for x (EX, sup X) the function q 1 (x; X) is a strictly increasing continuous function of x. The point x = EX can be added to the interval of continuity, since we have proved that it is a continuity point of FX (x) = q 1 (x; X). Corollary 5. Buered probability of exceedance p x (X) is a non-increasing right-continuous function of x with no more than one point of discontinuity. Proof. Immediately follows from denition p x (X) = 1 F X (x) and Proposition 2. Proposition 3. Function 1 1 F X (x) = 1 p x (X) is a convex function w.r.t. x. Moreover, it is piecewise-linear for discretely distributed X. 5

6 Proof. Consider interval EX < x 0, then max c<x x c E[X c] = max + c<x [x c] + E[X c] = max + c [x c] + E[X c] +. The last expression max c {[x c] + /E[X c] + } is convex over x as a maximum over the family of convex functions of x. p x (X) is a continuous non-increasing function on x (, sup X), therefore, 1/ p x (X) is a continuous non-decreasing function on x (, sup X). Then, extending the interval from (EX, sup X) to (, sup X) does not violate convexity of 1/ p x (X), since 1/ p x (X) = 1, i.e., constant, for x (, EX]. Further extending of the interval from (, sup X) to (, + ), i.e., R, will not violate convexity either, since 1/ p x (X) = + for x sup X. That is, 1/ p x (X) is a convex function of x. Suppose that X is discretely distributed. Again, 1/ p x (X) = 1 for x (, EX], and that is the rst interval of linearity. Consider probability atom with value x which random variable X takes with probability p. Denote α 1 = F X (x ) = P (X < x ), α 2 = F X (x) = P (X x ) = α 1 + p and x i = q α i(x) for i = 1, 2. Then for x 1 < x < x 2 we have x = q α (X) with α (α 1, α 2 ), therefore, q α (X) = x. Applying Corollary 1 we nd that 1/ p x (X) = (x x )/E[X x ] + for x 1 < x < x 2. Therefore, 1/ p x (X) is linear on x 1 < x < x 2. This way, all the atom probability intervals of type (F X (x ), F X (x )) [0, 1] will project into the intervals of type ( x 1 ; x 2 ) (EX, sup X) between corresponding superquantiles, covering all the interval (EX, sup X). Therefore, 1/ p x (X) is a piecewiselinear function on x (, sup X), and 1/ p x (X) = + on x [sup X, + ) Properties of bpoe w.r.t. Random Variable Proposition 4. Buered probability is a closed quasi-convex function of random variable (w.r.t. addition operation), i.e., the set {X p x (X) p} is a closed convex set of random variables for any p R. Furthermore, for p [0, 1), p x (X) p q 1 p (X) x. Proof. If p 1, then the inequality p x (X) p holds for any x and X. Therefore, the level-set {X p x (X) p} is a closed convex set. For p < 0, {X p x (X) p} =. Consider p [0, 1). Suppose p x (X) p, then p x (X) = p ε for some ε 0. Then, either q 1 px(x)(x) = q 1 p+ε (X) = x, therefore, q 1 p (X) x, or sup X x, therefore, q 1 p (X) q 1 (X) x. Conversely, if q 1 p (X) x, then either q 1 p+ε (X) = x for some ε 0, or sup X x. In the rst case, p x (X) = p ε p, and p x (X) p q 1 p (X) x. If sup X x, then p x (X) = 0 p. Function q 1 p (X) is a closed convex function of X, therefore, the set {X q 1 p (X) x} is closed convex. Then, the set {X p x (X) p} is closed convex. 6

7 Example 1. Buered probability of exceedance is not a convex function of random variable (w.r.t. addition operation), i.e., in general, p x (λx + (1 λ)y ) λ p x (X) + (1 λ) p x (Y ). Counterexample is as follows. Take x = 0 and { 1, with probability 1/2, X = 1, with probability 1/2. Take Y 0, λ = 1/2. Note that p 0 (X) = 1, since q 0 (X) = 0, p 0 (Y ) = 0. Note also that λx + (1 λ)y = X/2, therefore, p 0 (λx + (1 λ)y ) = 1 1/2 = λ p 0 (X) + (1 λ) p 0 (Y ). Denote by B λ the Bernoulli random variable with probability λ being equal to 1, i.e., { 1, with probability λ, B λ = 0, with probability 1 λ. Denote the mixture of random variables with coecient λ as λx (1 λ)y = XB λ + Y (1 B λ ), where B λ is independent of X and Y. In words, a mixture of random veriables with coecient λ is a random variable which takes a value of the rst random variable with probability λ, and a value of the second random variable with probability (1 λ). Mixture operation results from the addition operation over measures. Suppose µ and ν are measures. Then, scaled measure λµ is a measure satisfying (λµ)(a) = λµ(a) for any measurable set A, λ R. Sum of measures µ + ν is a measure satisfying (µ + ν)(a) = µ(a) + ν(a) for any measurable set A. Random variable X denes measure µ X on (R, B(R)) such that µ X (A) = P (X A), for any A B(R). Conversely, any nonnegative measure µ X on (R, B(R)) such that µ X (R) = 1 denes a random variable. Suppose that random variables X and Y correspond to measures µ X and µ Y. Measure µ Z = λµ X + (1 λ)µ Y for λ [0, 1] is a nonnegative measure on (R, B(R)) and µ Z (R) = λµ X (R) + (1 λ)µ Y (R) = 1. Therefore, µ Z denes the random variable Z. We call Z a mixture of random variables X and Y with coecient λ and denote Z = λx (1 λ)y. In particular, F λx (1 λ)y (z) = λf X (z)+(1 λ)f Y (z), where F Z is a cumulative distribution function of the random variable Z. Proposition 5. (1 α) q α (X) is a concave function of (X, α) w.r.t. mixture operation and addition operation correspondingly, i.e., (1 (λα 1 + (1 λ)α 2 )) q (λα1 +(1 λ)α 2 )(λx (1 λ)y ) λ [(1 α 1 ) q α1 (X)] + (1 λ) [(1 α 2 ) q α2 (Y )]. 7

8 Proof. Denote α M = λα 1 +(1 λ)α 2. Then, with denitions of CVaR and λx (1 λ)y, we have (1 α M ) q αm (λx (1 λ)y ) = min c { (1 αm )c + E[B λ X + (1 B λ )Y c] +} = = min c { (1 αm )c + E[X c] + I(B λ = 1) + E[Y c] + I(B λ = 0) }. Since B λ is independent of X and Y, then E[X c] + I(B λ = 1) = E[X c] + EI(B λ = 1) = λe[x c] +. Then, { (1 α M ) q αm (λx (1 λ)y ) = min (1 αm )c + λe[x c] + + (1 λ)e[y c] +} { c min λ(1 α1 )c 1 + λe[x c 1 ] + + (1 λ)(1 α 2 )c 2 + (1 λ)e[y c 2 ] +} = c 1,c 2 = λ(1 α 1 ) q α1 (X) + (1 λ)(1 α 2 ) q α2 (Y ). The following statement is similar to a proposition in [2], which has motivated Proposition 5 in the rst place. Here we show how this proposition can be proved from Proposition 5 as a corollary. Corollary 6. Let X(x, p) be a discretely distributed random variable, taking values x = (x 1,..., x m ) with probabilities p = (p 1,..., p m ), p i 0, m i=1 p i = 1. Then function q α (X(x, p)) is a concave function of p. Proof. Note that if p M = λp 1 + (1 λ)p 2, then F X(x,pM )(x) = λf X(x,p1 )(x) + (1 λ)f X(x,p2 )(x). Therefore, X(x, p M ) = λx(x, p 1 ) (1 λ)x(x, p 2 ). Then Proposition 5 implies the concavity of q α (X(x, p)) w.r.t. vector p. The following proposition is similar to the one in [10]. proposition can be proved from Proposition 5 as a corollary. Here we show how this Corollary 7. Let random variable X p have a distribution F (x; p) = m i=1 p if i (x), where F i (x) for i = 1,..., m are the distribution functions, and p = (p 1,..., p m ) R m, p i 0, m i=1 p i = 1. Then function q α (X p ) is a concave function of p. Proof. Note that if p M = λp 1 + (1 λ)p 2, then F (x; p M ) = λf (x; p 1 ) + (1 λ)f (x; p 2 ). Therefore, X pm = λx p1 (1 λ)x p2. Then Proposition 5 implies the concavity of q α (X p ) w.r.t. vector p. Proposition 6. Buered probability of exceedance is a concave function of random variable w.r.t. mixture operation, i.e., p x (λx (1 λ)y ) λ p x (X)+(1 λ) p x (Y ), λ (0, 1). Proof. Suppose p x (X) = α 1 and p x (Y ) = α 2, then there are three possible cases. First, α 1 = α 2 = 1, then EX x and EY x, therefore, EλX (1 λ)y = λex + (1 λ)ey x, and p x (λx (1 λ)y ) = λ p x (X) + (1 λ) p x (Y ) = 1. Second, α 1 = α 2 = 0, then sup X x and sup Y x, and sup λx (1 λ)y max{sup X, sup Y } x. 8

9 Therefore, p x (λx (1 λ)y ) = λ p x (X) + (1 λ) p x (Y ) = 0. Third, in all other cases λα 1 + (1 λ)α 2 (0, 1). By Proposition 4, p x (X) p q 1 p (X) x for p (0, 1), therefore, p x (X) > p q 1 p (X) > x and, furthermore, p x (X) > p q 1 p (X) > x, sup X > x. Note also that p x (X) = p (0, 1) q 1 p (X) = x, sup X > x. Then, is equivalent to Note that p x (λx (1 λ)y ) λ p x (X) + (1 λ) p x (Y ) = λα 1 + (1 λ)α 2 (0, 1), q 1 (λα1 +(1 λ)α 2 )(λx (1 λ)y ) x, sup λx (1 λ)y > x. α 1 q 1 α1 (X) α 1 x. (5) If α 1 = 0, then 0 0. If α 1 = 1, then q 1 1 (X) = EX x. If α 1 (0, 1), then q 1 α1 (X) = x. Similarly, α 2 q 1 α2 (Y ) α 2 x. (6) Implying Proposition 5 and inequalities (5), (6), we get q λ(1 α1 )+(1 λ)(1 α 2 )(λx (1 λ)y ) λα 1 q 1 α1 (X) + (1 λ)α 2 q 1 α2 (Y ) λα 1 + (1 λ)α 2 λα 1x + (1 λ)α 2 x = x. λα 1 + (1 λ)α 2 Since α 1 > 0 or α 2 > 0, then sup X > x or sup Y > x, therefore sup λx (1 λ)y = max{sup X, sup Y } > x, which nishes the proof. It was mentioned that distribution functions are linear w.r.t. mixture operation: F λx (1 λ)y (x) = λf X (x) + (1 λ)f Y (x). Note that Proposition 6 proves that superdistribution functions are convex w.r.t. mixture operation: FλX (1 λ)y (x) λ F X (x) + (1 λ) F Y (x). Proposition 7. Buered probability of exceedance p x (X) is a monotonic function of random variable, i.e., p x (Y ) p x (Z) for Y Z almost surely. Proof. Suppose x sup Y and x sup Z. Then min a 0 E[aY +1] + min a 0 E[aZ +1] +, therefore, p x (Y ) p x (Z). Suppose x = sup Y, then p x (Y ) = 0 p x (Z). Suppose x = sup Z, then x sup Y, therefore, p x (Y ) = 0 = p x (Z). 4. Optimization Problems with bpoe 4.1. Two Families of Optimization Problems Denote program P(x), Denote program Q(α), P(x) : min p x (X) s.t. X X. Q(α) : min q α (X) s.t. X X. 9

10 For a set of random variables X, dene e X = inf X X EX, s X = inf sup X. X X Proposition 8. Let X 0 X be an optimal solution to P(x 0 ), where e X < x 0 s X. Then, X 0 is an optimal solution to Q(1 p x0 (X 0 )). Proof. Denote p = p x0 (X 0 ). Since x 0 > e X, then p x0 (X 0 ) < 1. If p x0 (X 0 ) = 0, then sup X 0 x 0, but x 0 s X, and sup X 0 s X by denition, therefore, sup X 0 = x 0 = s X. If 0 0 and sup X x 0. Therefore, there exists p < p such that q 1 p (X ) = x 0, since q 1 p (X) is a continuous non-increasing function of p. There are two possible cases. First, if sup X = x 0, then p x0 (X ) = 0 < p, X 0 is not an optimal solution to P(x 0 ), contradiction. Second, if sup X > x 0, then p x0 (X ) = p < p, X 0 is not an optimal solution to P(x 0 ), contradiction. Two intervals for x 0 are not covered in Proposition 8. Note that for x 0 e X the optimal value for P(x 0 ) is 1, therefore, any feasible solution is an optimal solution. As for the interval x > s X, optimal value for P(x 0 ) is 0. If s X s X, and it is not optimal for Q(1). Proposition 9. Let X 0 X be an optimal solution to Q(α 0 ). Then X 0 is an optimal solution to P( q α0 (X 0 )), unless sup X 0 > q α0 (X 0 ) and there exists X X such that 1. sup X = q α0 (X 0 ), 2. P (X = sup X ) 1 α 0. Proof. Denote x 0 = q α0 (X 0 ). First, suppose sup X 0 = x 0. Then p x0 (X 0 ) = 0, and X 0 is an optimal solution to P(x 0 ). Second, suppose that sup X 0 > x 0 and that exists X X such that p x0 (X ) x 0, then p x0 (X 0 ) = 1 α 0. Suppose sup X > x 0, then q α (X ) is strictly increasing on [0, 1 p x0 (X )]. Therefore, q α0 (X ) < q 1 px0 (X )(X ) = x 0, which implies that X 0 is not an optimal solution to Q(α 0 ), contradiction. Consequently, sup X = x 0. Suppose P (X = x 0 ) < 1 α 0. Then, q α (X ) is strictly increasing on [0, 1 P (X = x 0 )], and q α0 (X ) < x 0. Therefore, X 0 is not an optimal solution to Q(α 0 ), contradiction. Therefore, P (X = x 0 ) 1 α 0. Intuiton behind Proposition 9 is as follows. Note that X is also an optimal solution to Q(α 0 ). Therefore, we have two optimal solutions to right tail expectation minimization problem. The dierence between optimal solutions X and X 0 is that X is constant in its right 1 α 0 tail, and X 0 is not, since q α0 (X 0 ) < sup X 0. Proposition 9 implies that X is an optimal solution to P( q α0 (X 0 )), while X 0 is not. Which is a very natural risk-averse decision. This implies that, for certain problems, formulations of type P(x) provide more reasonable solutions than formulations of type Q(α). Corollary 8. Let X be a set of random variables, such that sup X = for all X X. Then, program families P(x), for x > e X, and Q(α), for 0 < α < 1, have the same set of optimal solutions. That is, if X 0 is optimal for P(x 0 ), then X 0 is optimal for Q(1 p x0 (X 0 )). Conversely, if X 0 is optimal for Q(α 0 ), then X 0 is optimal for P( q α0 (X 0 )). 10

11 Proof. Proposition 8 implies that if e X < x 0 s X =, then if X 0 is optimal for P(x 0 ), then X 0 is optimal for Q(1 p x0 (X 0 )). Note that since e X < x 0 < s X, then p x0 (X 0 ) (0, 1). Proposition 9 implies that if X 0 is optimal for Q(α 0 ), then X 0 is optimal for P( q α0 (X 0 )), unless exists X X such that sup X = q α0 (X 0 ). Which is impossible since sup X = > q α0 (X 0 ). Note that since α 0 (0, 1), then e X < q α0 (X 0 ) <. Assumption sup X = + for all X X in Corollary 8 might be too strong for some practical problems, where it is a common practice for all random variables to be dened on a nite probability space generated by system observations. Let us describe sets of optimal points (x, α) for problem families P(x) and Q(α). Dene f P (x) = min X X p x(x), f Q (α) = min X X q α(x). Then, sets of all optimal points of P(x) and Q(α) families are S P = {(x, α) f P (x) = 1 α}, S Q = {(x, α) f Q (α) = x}. Finally, reduced sets of optimal points are S P = {(x, α) S P e X x s X }, S Q = {(x, α) S Q x < s X } {(s X, 1)}. For any random variable X X there is a set S X = {(x, α) q α (X) = x}. Let us dene union of such sets for X X as S X = S X = {(x, α) exists X : q α (X) = x}. X X Naturally, we prefer random variables with superquantile as small as possible for a xed condence level and with condence level as big as possible for a xed superquantile value. Therefore, for the set S X we dene a Pareto front, which is often called an ecient frontier in nance, as follows: S X = {(x, α) S X x < x or α > α for all (x, α ) S X, (x, α ) (x, α)}. Proposition 10. S P S Q = S P = S Q = S X. Proof. Let us start with S P S Q = S X. Notice that (x, α) S P (x, α ) S X α α, (7) (x, α) S Q (x, α) S X x x. (8) Clearly, right sides of (7) and (8) hold for S X, which implies S X S P S Q. Suppose S X S P S Q, i.e., for some (x, α) S P S Q there exists (x, α ) S X such that x x, α α and (x, α ) (x, α). Notice that if (x, α) S P S Q, then (x, α ) S X α α and (x, α) S X x x. Then, x < x and α > α. Consider random variable X which has generated point (x, α ). Since q X (α ) < x, then q X (α) < x. Therefore, there exists ( q X (α), α) S X with q X (α) < x, while (x, α) S Q and (8) holds. Contradiction. Let us prove S P = S Q. Suppose (x, α) S P and x e X, then we can use Proposition 8 to conclude that (x, α) S Q. If x = e X, then p x (X) = 0, therefore, 11

12 (x, α) = (e X, 0) S Q. Let (x, α) S Q. If x < s X, then there is no X such that sup X = x, therefore, we can use Proposition 9 and conclude that (x, α) S P. Finally, let us prove S P S Q = S P. Since S P S P, S Q S Q and S P = S Q, then S P S P S Q. Suppose (x, α) S P S Q. If x < s X, then (x, α) S P. If x = s X, then α = 1, because there is only one α for any x in S P. Then (x, α) = (s X, 1) S P. Point with x > s X can not be in S Q since f Q (α) = min X X q α (X) min X X sup X = s X. Therefore, S P S Q S P, which nalizes the proof Parametric Simplex Algorithm for CVaR and bpoe Minimization Suppose that we are interested in solution to P 1 (x) for x s X. Suppose also we have an algorithm for solving P 2 (α), i.e. we can calculate function f 2 (α) for any α [0, 1]. Then we can nd an approximation to f 1 (x) by calculating f 2 (α) several times. First, calculate f 2 (1) = x X. If x > f 2 (1), then problem P 1 (x) is inecient. If x = f 2 (1), then f 1 (x) = 0. If x < f 2 (1), continue. Calculate f 2 (0). If x < f 2 (0), then P 1 (x) is infeasible. If x = f 2 (0), then f 1 (x) = 1. If x > f 2 (0), continue. Set a = 0, b = 1. Inequality 1 b < f 1 (x) < 1 a holds. We will calculate f 2 ((a + b)/2) at each step of the binary search procedure to make dierence b a as small as we need. Suppose that problem P 2 (α) can be expressed as a linear program. Let X 1,..., X n be a set of random variables discretely distributed on the common set of m scenarios, with scenario probabilities p 1,..., p m. Let random variable X i take value x j i under scenario j. Let λ = (λ 1,..., λ n ) be a set of decision variables such that X X X = n λ i X i, for some λ Λ, where Λ R n is a polyhedral set. Then P 2 (α) is equivalent to ( n ) min q α λ i X i λ i=1 i=1 s.t. λ Λ. With minimization form of CVaR, which is, q α (x) = min c { c α E[X c]+}, we reformulate P 2 (α) as min c,λ c α s.t. λ Λ. [ m n ] + p j λ i x j i c i=1 j=1 We will slightly adapt the parametric simplex method, see e.g. [8], [7], to solve this problem for all values α [0, 1]. To start, we need to obtain a basic feasible solution for one of extreme values, say α = 0. To get a solution for α = 0 we need to nd a random variable with minimal expectation. For example, if λ 0 and n i=1 λ i = 1, then we nd EX i = j p jx j i for all i and then take i = arg min i EX i. Then the optimal solution is λ 0 such that λ i = 1, λ j = 0 for j i. After obtaining the rst solution, denote µ = 1 and express reduced costs for 1 α nonbasic decision variables as linear functions of µ. Since the solution is optimal for 12

13 µ 0 = 1, all reduced costs are nonnegative at µ = 1. We do not have dependence on µ in constraints, that is why if µ is changing, solution remains primal feasible, but may become dual infeasible. Let us nd the biggest parameter value µ 1 at which reduced costs are still nonnegative. For µ 0 µ µ 1 solution λ 0 remains optimal. When µ > µ 1 some reduced costs became negative, that is why we make primal pivots until we nd new optimal solution λ 1. Then we express reduced costs as linear functions of µ, nd the next critical value µ 2 at which some costs reduce to 0, and so forth, see [8] for detailes. At some point all reduced costs will become nonnegative, no matter how big µ is, it means that the current solution is optimal for µ up to +, or up to α = 1. As a result of the algorithm we have: a sequence of parameters 1 = µ 0,..., µ M which corresponds to sequence α 0 = 0,..., α M = 1 1/µ M, α M+1 = 1; a sequence of optimal solutions λ 0,..., λ M ; a sequence of optimal objective values f 2(α i ) (f 2 (α M+1 ) = f 2 (α M )). To calculate f 1 (x), nd the interval [α j, α j+1 ] such that f 2 (α j ) x f 2 (α j+1 ). Then optimal solution is λ j = (λ j 1,..., λ j n) and f 1 (x) = p x ( n i=1 λj i X i) Finite Probability Space Applications Proposition 11. Let X be a convex set of random variables. inf X X p x (X), there are two cases: Then for the problem 1. If inf X X sup X is attained for some X X, and x = min X X sup X = sup X, then min X X p x (X) = 0 with optimal solution X. 2. Problem inf X X p x (X) can be reformulated as a following problem: inf E[Y + Y Y 1]+, where Y = cl cone(x x) is a closed convex cone. Proof. If Case 1 is not valid, then with Proposition 1 we conclude that inf p x(x) = inf min E[a(X x) + X X X X a 0 1]+ = inf E[a(X x) + X X,a 0 1]+. Denote Y = a(x x). Since X is convex, then constraints X X, a 0 are equivalent to Y cone(x x). Suppose that sequence {Y i } i=1 cone(x x) converges weakly to Y Y = cl cone(x x). Since cone(x x) is a convex set, and for convex sets weak and L 1 convergences are equivalent, then sequence {Y i } i=1 is L 1 -converging to Y. Therefore, lim E[Y i + 1] + = E[Y + 1] +. i Then, nally, inf p x(x) = inf E[Y + X X Y Y 1]+. Denote Π m = {q = (q 1,..., q m ) T R m q i 0, i = 1,..., m; m i=1 q i = 1}. Denote Π m + = {q = (q 1,..., q m ) T R m q i > 0, i = 1,..., m; m i=1 q i = 1}. For the following proposition we suppose X to be a set of random variables dened on the common nite probability space with the vector of elementary events' probabilies p = (p 1,..., p m ) T Π m +. Denote by the set S R m the set of vectors such that any random 13

14 variable X X takes values x 1,..., x m with probabilities p 1,..., p m correspondingly, for some x = (x 1,..., x m ) T S. Then X being closed convex set is equivalent to S being a closed convex set. Let us say that random variable X takes values from x = (x 1,..., x m ) R m with probabilities p = (p 1,..., p m ) Π m if X is disctretely distributed over m atoms and takes value x i with probability p i for i = 1,..., m. Corollary 9. Let X be a set of random variables such that X X X takes values from x S with probabilities p, where S R m is a convex set, p Π m +. Then for the problem inf X X p x (X), there are two cases: 1. If inf x S max i x i is attained for some x S, and x = min x S max i x i = max i x i, then min X X p x (X) = 0 with optimal solution X taking values from x. 2. Problem inf X X p x (X) can be reformulated as a following problem: inf y C pt [y + e] +, where C = cl cone(s xe) is a closed convex cone, e = (1,..., 1) T R m. Proof. Let us apply Proposition 11 to the specic case of nite probability space. If we consider the Case 2, then problem inf X X p x (X) can be reformulated inf y C pt [y + e] +, where C = cl cone(s xe) is a closed convex cone, since S is convex. Corollary 10. Let X be a set of random variables such that X X X takes values from x S with probabilities p, where S = {x Ax b} R m, p Π m +. Then for the problem inf X X p x (X), there are two cases: 1. If x = min x S max i x i, then min X X p x (X) = 0 with optimal solution X taking values from x such that max i x i = x. 2. Problem inf X X p x (X) can be reformulated as an LP: inf p T z (9) s.t. z y + e, (10) Ay a(b xae) 0, (11) z 0, a 0. (12) Proof. Corollary 9 implies that for the Case 2 problem inf X X p x (X) can be reformulated as inf y C p T [y + e] +, with C = cl cone(s xe). Note that S xe = {x A(x + xe) b} = {x Ax b xae}. 14

15 Note also that cone(s xe) = {ax Ax b xae, a > 0} {0} = {y Ay a(b xae), a > 0} {0}. Therefore, cl cone(s xe) = {y Ay a(b xae), a 0} {0} = {y Ay a(b xae), a 0}. Finally, introducing z = [y + e] +, we obtain reformulation (9)(12). Consider random real-valued function f(w; X), where w R k and X T = (X 1,..., X n ) is a random vector of dimension n. It is assumed here that variables X 1,..., X n can be observed, but can not be controlled. It is also assumed that value f(w; X) can be controlled by the vector w W R k. Proposition 12. Let f(w; X) be a convex function of w. Then p x (f(w; X)) is a quasiconvex function of w. Proof. Convexity of f implies f(w M ; X) λf(w 1 ; X) + (1 λ)f(w 2 ; X), for w M = λw 1 + (1 λ)w 2. Then, using monotonicity of p x (X), see Proposition 7, p x (f(w M ; X)) p x (λf(w 1 ; X) + (1 λ)f(w 2 ; X). p x (X) is a quasi-convex function of X, see Proposition 4. Quasi-convexity of a function p x (X) is equivalent to p x (λx 1 + (1 λ)x 2 ) max{ p x (X 1 ), p x (X 2 )}. Then, Therefore, p x (λf(w 1 ; X) + (1 λ)f(w 2 ; X)) max{ p x (f(w 1 ; X)), p x (f(w 2 ; X))}. p x (f(w M ; X)) max{ p x (f(w 1 ; X)), p x (f(w 2 ; X))}, i.e., p x (f(w; X)) is a quasi-convex function of w. Proposition 13. Let X be a random vector and let f(w; X) be a convex positive-homogeneous function of w. Assume that convergence of {w i } implies L 1 -convergence of {f(w i ; X)}. Let W be a convex set. Then for the problem inf w W p x (f(w; X)) there are two possible cases: 1. If inf w W sup f(w; X) is attained for some w W, and x = min w W sup f(w; X) = sup f(w ; X), then min w W p x (f(w; X)) = 0 with optimal solution w. 2. Problem inf w W p x (f(w; X)) can be reformulated as a convex programming problem: inf E[ f(v; X) + 1] +, v V where v T = (v 1,..., v k+1 ) R k+1, f(v; X) = f((v1,..., v k ) T ; X) v k+1, and V = cl cone(w {x}) is a closed convex cone. 15

16 Proof. In the Case 2 we can reformulate problem inf w W p x (f(w; X)) as follows: inf p x(f(w; X)) = inf E[a(f(w; X) x) + w W a 0,w W 1]+. Denote v T = (v 1,..., v k+1 ) R k+1 and f(v; X) = f((v 1,..., v k ) T ; X) v k+1. Then, inf p x(f(w; X)) = inf E[a f(v; X) + 1] +. w W a 0,v W {x} Note that f(v; X) is also a convex positive-homogeneous function of v. Then, inf p x(f(w; X)) = inf E[ f(av; X) + 1] +. w W a 0,v W {x} Note that since W is a convex set, then W {x} is also a convex set. Therefore, a 0, v W {x} av cone(w {x}). Note that feasible region can be extended to V = cl cone(w {x}): since convergence of {w i } implies L 1 -convergence of f(w; X), then convergence of v i v implies L 1 - convergence of f(v i ; X) L1 f(v; X), therefore, E[ f(v i ; X) + 1] + E[ f(v; X) + 1] +. Finally, inf p x(f(w; X)) = inf E[ f(v; X) + 1] +. w W v V Corollary 11. Let X = (X 1,..., X n, 1) T be a random vector, with the last component being a constant 1, and E X i < for i = 1,..., n. Let W R n+1 be a convex set. Then for the problem inf w W p x (w T X) there are two possible cases as follows: 1. If inf w W sup w T X is attained for some w W, and x = min w W sup w T X = sup(w ) T X, then min w W p x (w T X) = 0 with optimal solution w. 2. Problem inf w W p x (w T X) can be reformulated as the following convex programming problem: inf v V E[vT X + 1] +, where V = cl cone(w xe n+1 ) is a closed convex cone, where e n+1 = (0,..., 0, 1) T R n+1. Proof. Let us show that this corollary follows from Proposition 13. First, f(w; X) = n i=1 w ix i + w n+1 is convex and positive-homogeneous w.r.t. w. Second, suppose that w j w. Then E (w j ) T X w T X w j n+1 w n+1 + n w j i w i E X i 0, i=1 since E X i <. Therefore, convergence of w implies L 1 -convergence of f(w; X). Note that in this particular case of function f there is no need to introduce a new parameter. It is sucient to shift feasible region for w n+1 by x: W = W xen+1. Further change of variables v = aw, and setting V = cl cone( W ), as it is done in Proposition 13, nalizes the proof. 16

17 Corollary 12. Let X = (X 1,..., X n, 1) T be a random vector, with the last component being a constant 1, and E X i < for i = 1,..., n. Let W = {w Aw b} R n+1. Then for the problem inf w W p x (w T X) there are two possible cases: 1. If inf w W sup w T X is attained for some w W, and x = min w W sup w T X = sup(w ) T X, then min w W p x (w T X) = 0 with optimal solution w. 2. Problem inf w W p x (w T X) can be reformulated as the linear programming problem: inf E[v T X + 1] +, (13) s.t. Av T a(b xae n+1 ) 0, (14) a 0. (15) Proof. Let us prove that this corollary follows from Corollary 11. Note that Note further that W xe n+1 = {w Aw + xae n+1 b}. cone(w xe n+1 ) = {v Av + axae n+1 ab, a > 0} {0}. Finally, V = cl cone(w xe n+1 ) = {v Av a(b xae n+1 ) 0, a 0}. Corollary 13. Let X be a random vector taking values x 1,..., x m R n with probabilities p = (p 1,..., p m ) Π m +. Let f(w; X) be a convex positive-homogeneous function of w R k. Let W R k be a convex set. Then for the problem inf w W p x (f(w; X)) there are two possible cases: 1. If inf w W max j f(w; x j ) is attained for some w W, and x = min w W max j f(w; x j ) = max j f(w ; x j ), then min w W p x (f(w; X)) = 0 with optimal solution w. 2. Problem inf w W p x (f(w; X)) can be reformulated as the convex programming problem: inf v V m p j [ f(v; x j ) + 1] +, j=1 where v T = (v 1,..., v k+1 ) R k+1, f(v; X) = f((v1,..., v k ) T ; X) v k+1, and V = cl cone(w {x}) is a closed convex cone. Proof. Note that since there are nitely many scenarios for random vector X, then for w i w, due to continuity of function f w.r.t. w, we have max j f(w i ; x j ) f(w; x j ) 0. That is, convergence on w implies L 1 -convergence of f(w; X). Therefore, this corollary follows directly from Proposition References [1] Norton, M., and Uryasev, S. AUC and Buered AUC Maximization. University of Florida, Research Report, in preparation, [2] Pavlikov, K., and Uryasev, S. CVaR Distance between Distributions and Applications. University of Florida, Research Report, in preparation,

18 [3] Rockafellar, R. T., and Royset, J. O. Random variables, monotone relations and convex analysis. Mathematical Programming B, accepted. [4] Rockafellar, R. T., and Royset, J. O. On buered failure probability in design and optimization of structures. Reliability Engineering and System Safety 95, 5 (2010), [5] Rockafellar, R. T., and Uryasev, S. Conditional value-at-risk for general loss distributions. Journal of Banking and Finance (2002), [6] Rockafellar, R. T., and Uryasev, S. The Fundamental Risk Quadrangle in Risk Management, Optimization and Statistical Estimation. Surveys in Operations Research and Management Science 18 (2013). [7] Ruszczynski, A., and Vanderbei, R. J. Frontiers of Stochastically Nondominated Portfolios. Econometrica 71, 4 (2003), [8] Vanderbei, R. J. Linear Programming: Foundations and Extensions. International series in operations research & management science. Kluwer Academic, [9] Zabarankin, M., and Uryasev, S. Statistical decision problems. Selected concepts and portfolio safeguard case studies. Springer Optimization and Its Applications 85. New York, NY: Springer. [10] Zdanovskaya, V., Pavlikov, K., and Uryasev, S. Estimation of Mixtures of Continuous Distributions: Mixtures of Normal Distributions and Applications. University of Florida, Research Report, in preparation,