Buered Probability of Exceedance: Mathematical Properties and Optimization Algorithms Alexander Mafusalov, Stan Uryasev RESEARCH REPORT 2014-1 Risk Management and Financial Engineering Lab Department of Industrial and Systems Engineering 303 Weil Hall, University of Florida, Gainesville, FL 32611. E-mails: mafusalov@ufl.edu, uryasev@ufl.edu. First draft: October 2014, This draft: October 2014 Correspondence should be addressed to: Stan Uryasev Abstract This paper introduces a new probabilistic characteristic called buered probability of exceedance (bpoe). This characteristic is an extension of so-called buered probability of failure and it is equal to one minus superdistribution function. Paper provides ecient calculation formulas for bpoe. bpoe is proved to be a quasi-convex function of random variable w.r.t. the regular addition operation and a concave function w.r.t. the mixture operation; it is a monotonic function of random variable. bpoe is proved to be a strictly decreasing function of the parameter on the interval between the mathematical expectation and the essential supremum. Multiplicative inverse of the bpoe is proved to be a convex function of parameter, and a piecewise-linear function in the case of discretely distributed random variable. Minimization of the bpoe can be reduced to a convex program for a convex feasible region and to LP for a polyhedral feasible region. A family of bpoe minimization problems and family of the corresponding CVaR minimization problems share the same frontier of optimal solutions and optimal values. Keywords: probability of failure, probability of exceedance, buered probability of failure, superdistribution, superquantile, Conditional Value-at-Risk, CVaR, parametric simplex method 1. Introduction This paper uses the notation CVaR α (X) for conditional-value-at-risk (CVaR) for a random variable X and a condence level α [0, 1], explored in [5]. To have a more consice notation, an alternative name superquantile q α (X) is used, similar to a regular quantile q α (X). That is, q α (X) = CVaR α (X). Notation q(α; X) is used to present superquantile q α (X) = q(α; X) as a function of parameter α. For example, q 1 (x; X) should be interpreted as an inverse function of superquantile as a function of α. Probability of exceedance is dened as p x (X) = P (X > x) = 1 F X (x), where F X (x) is a distribution function. In engineering applications it is usual to see an optimization problem with probability of exceedance in constraints or as an objective. Paper [4] suggests as an alternative to the probability of failure, which is p(x) = P (X > 0), the
buered probability of failure, which is a value p(x) such that q p(x) (X) = 0. This paper denes buered probability of exceedance p x (X) in a way that p 0 (X) = p(x) and p x (X) = p 0 (X x). To dene buered probability of exceedandce, we introduce the following mathematical notions from paper [3]. For any random variable X with distribution function F X (x) there is an auxilary random variable X = q(f X (X); X) with distribution function F X(x) = F X (x), called superdistribution function, and 1, for x sup X; F X (x) = q 1 (x; X), for EX < x < sup X; 0, otherwise, where q 1 (x; X) is an inverse of the function q(α; X) as a function of α. Denition 1. For a random variable X and x R, buered probability of exceedance is dened as follows 0, for x sup X; p x (X) = 1 F X (x) = 1 q 1 (x; X), for EX < x < sup X; 1, otherwise. Book [9] considers Chebyshev-type family of inequalities with CVaR deviation and shows that the tightest inequality in the family is obtained for α = p x (X), and the tightest inequality itself reduces to p x (X) p x (X). (1) Inequality (1) is similar to q α (X) q α (X). Inequality (1) is one of the motivations for introducing buered probability of exceedance instead of regular probability of exceedance. Paper [4] uses inequality (1) to argue that the buered probability of failure is a conservative estimate of the probability of failure. Similarly, buered probability of exceedance is a conservative estimate of the probability of exceedance. Section 2 proves several formulas for ecient calculation of p x (X). Section 3.1 investigates mathematical properties of p x (X) w.r.t. parameter x. Section 3.2 establishes mathematical properties of p x (X) w.r.t. random variable X. Section 4 studies minimization of p x (X) over a feasible region X X. 2. Calculation Formulas for BPOE Note that, since q α (X) x = q α (X x) for any constant x, see e.g. [6], then X x = X x and F X (x) = F X x (0). Therefore, p x (X) = p 0 (X x). The following proposition is a slightly modied proposition from paper [1], studying applications of buered probability of exceedance in classication. Proposition 1. For a random variable X and x R, buered probability of exceedance equals { 0, if x = sup X; p x (X) = (2) min a 0 E[a(X x) + 1] +, otherwise. 2
Proof. In the denition of buered probability of exceedance we have three cases: 1. p x (X) = 1 q 1 (x; X) when EX < x < sup X, 2. p x (X) = 1 when x < EX or x = EX < sup X, 3. p x (X) = 0 when x sup X. Let us prove the proposition case by case. 1. Let EX < x < sup X, and take x = 0. Since q α (X) is a strictly increasing function of α on α [0, 1 P (X = sup X)], then equation q p (X) = 0 has a unique solution p for EX < x < sup X. Then, p 0 (X) = p such that min c c + 1 p E[X c]+ = 0. Since q α (X) is an increasing function of parameter α, then we can reformulate p 0 (X) = min p p such that min c c + 1 p E[X c]+ 0. Therefore, p 0 (X) = min p,c p s.t. c + 1 p E[X c]+ 0. Optimal c < 0, since c c + 1 p E[X c ] + 0, and c = 0 implies sup X 0, which is not the case we consider. Therefore, p 0 (X) = min p,c s.t. p p c c + E [ 1 c X c ] + 0. c Since c < 0, then c c and, therefore, 1 = 1. Further, denoting a =, we have c p 0 (X) = min p,a>0 p s.t. E[aX + 1] + p. p 0 (X) = min a 0 E[aX + 1]+. Note that change a > 0 to a 0 includes value 1 to the feasible region, which does not aect the case considered. Finally, since p x (X) = p 0 (X x), then p x (X) = min a 0 E[a(X x) + 1]+. 2. When EX x, we have E[a(X x) + 1] + ae(x x) + 1 1. Note also that E[a(X x) + 1] + = 1 for a = 0. Therefore, min a 0 E[a(X x) + 1] + = 1. 3. For x = sup X, by the formula, p x (X) = 0. Consider x > sup X, i.e., X x ε < 0. Taking a = 1 ε makes a(x x) 1, therefore, min a 0 E[a(X x) + 1] + = 0. Corollary 1. For EX < x < sup X, p x (X) = 1 q 1 (x; X) = min c<x 3 E[X c] +. (3) x c
Furthermore, for x = q α (X), where α (0, 1), it is valid that and, consequently, q α (X) arg min c<x E[X c] +, x c p x (X) = E[X q α(x)] + q α (X) q α (X). Proof. Since EX < x < sup X, then q 1 (x; X) (0, 1), therefore, a = 0 is not optimal for min a 0 E[a(X x)+1] +. Therefore, change of variable a 1 leads to an equivalent x c program: [ ] + 1 min E[a(X x) + a 0 1]+ = min E E[X c] + (X x) + 1 = min. c<x x c c<x x c Note that if x = q α (X), then p x (X) = 1 q 1 (x; X) = 1 α. Since q α (X) = q α (X) + 1 1 α E[X q α] +, then p x (X) = 1 α = E[X q α(x)] + q α (X) q α (X), that is, q α (X) arg min c<x E[X c] + x c. Let X be a discretely distributed random variable with atoms {x i } N i=1, and probabilities {p i } N i=1, where x i x i+1, i = 1,..., N 1, and N is either nite or N =. For condence levels α j = j i=1 pi, where j = 0,..., N, let us denote corresponding superquantiles x j = N i=j+1 xi p i /(1 α j ), with x N = x N for nite N and x N = lim i x i for N =. Then, p x (X) = 1 for x x 0 = EX, p x (X) = 0 for x x N = sup X, and p x j(x) = 1 α j for j = 0,..., N 1. Corollary 2. p x (X) = E[X xj+1 ] + x x j+1 = for x j < x < x j+1, where j = 0,..., N 1. N i=j+1 pi [x i x j+1 ] + x x j+1, (4) Proof. Note that for x j < q α (X) < x j+1 we have α j < α < α j+1, therefore, q α (X) = x j+1. Therefore, formula (4) is implied by Corollary 1. Buered probability is calculated with a simple formula for a set of specic values x j, j = 0,..., N. The following proposition presents a formula for calculation bpoe in intermediate values, i.e., for x such that x j < x < x j+1. Such value x can be also represented as a weighted combination of values x j and x j+1 : x = µ x j + (1 µ) x j+1, for some µ (0, 1). Corollary 3. For µ (0, 1) and j = 0,..., N 1 ( p(µ x j + (1 µ) x j+1 µ ; X) = p( x j ; X) + 1 µ ) 1, p( x j+1 ; X) i.e. 1/ p x (X) is a piecewise-linear function of x. 4
Proof. Corollary 2 implies 1 p x (X) = x x j+1 N i=j+1 pi [x i x j+1 ] +, for x j < x < x j+1, where j = 0,..., N 1. Therefore, since p x (X) is continuous for x [EX, sup X), see Proposition 2, then 1/ p x (X) is a piecewise-linear function of x. 3. Mathematical Properties of bpoe 3.1. Properties of bpoe w.r.t. Parameter x Proposition 2. Distribution F X(x) = F X (x) has no more than one atom at sup X = sup X with probability P ( X = sup X) = P (X = sup X). Proof. Note that if for α 1 < α 2 we have q α1 (X) = q α2 (X), then, by denition of CVaR, { min c + 1 } { E[X c] + = min c + 1 } E[X c] +. c 1 α 1 c 1 α 2 For each value of c, if c < sup X, then E[X c] + > 0 and c + 1 1 α 1 E[X c] + < c + { } 1 1 α 2 E[X c] +. Therefore, arg min c c + 1 1 α 1 E[X c] + = sup X. It proves that q α (X) as a function of α can have only one interval of constancy, which is for α [1 P (X = sup X), 1]. For the interval α [0, 1 P (X = sup X)] the function q α (X) is strictly increasing in α. This implies that if superdistribution has an atom, then there are two possible locations. The rst case, x = EX, but q 0 (X) = EX, therefore, FX (EX 0) = F X (EX + 0) = 0, and EX is a continuity point of superdistribution. The second case, x = sup X, then lim x sup X 0 FX (x) = 1 P (X = sup X). Since F X (sup X) = 1, and X X, see [3], then sup X = sup X and P ( X = sup X) = P (X = sup X). Corollary 4. For any random variable X, buered probability of exceedance p x (X) is a continuous strictly decreasing function of x at the interval x [EX, sup X). Proof. bpoe equals 1 q 1 (x; X) for x [EX, sup X). The function q(α; X) is strictly increasing continuous for α [0, 1 P (X = sup X)] (see, e.g., proof of Proposition 2). Therefore, for x (EX, sup X) the function q 1 (x; X) is a strictly increasing continuous function of x. The point x = EX can be added to the interval of continuity, since we have proved that it is a continuity point of FX (x) = q 1 (x; X). Corollary 5. Buered probability of exceedance p x (X) is a non-increasing right-continuous function of x with no more than one point of discontinuity. Proof. Immediately follows from denition p x (X) = 1 F X (x) and Proposition 2. Proposition 3. Function 1 1 F X (x) = 1 p x (X) is a convex function w.r.t. x. Moreover, it is piecewise-linear for discretely distributed X. 5
Proof. Consider interval EX < x < sup X, where formula (3) is valid. Then, 1 p x (X) = 1/ min c<x E[X c] + x c = max c<x x c E[X c] +. Note that since max c<x (x c)/e[x c] + > 0, then max c<x x c E[X c] = max + c<x [x c] + E[X c] = max + c [x c] + E[X c] +. The last expression max c {[x c] + /E[X c] + } is convex over x as a maximum over the family of convex functions of x. p x (X) is a continuous non-increasing function on x (, sup X), therefore, 1/ p x (X) is a continuous non-decreasing function on x (, sup X). Then, extending the interval from (EX, sup X) to (, sup X) does not violate convexity of 1/ p x (X), since 1/ p x (X) = 1, i.e., constant, for x (, EX]. Further extending of the interval from (, sup X) to (, + ), i.e., R, will not violate convexity either, since 1/ p x (X) = + for x sup X. That is, 1/ p x (X) is a convex function of x. Suppose that X is discretely distributed. Again, 1/ p x (X) = 1 for x (, EX], and that is the rst interval of linearity. Consider probability atom with value x which random variable X takes with probability p. Denote α 1 = F X (x ) = P (X < x ), α 2 = F X (x) = P (X x ) = α 1 + p and x i = q α i(x) for i = 1, 2. Then for x 1 < x < x 2 we have x = q α (X) with α (α 1, α 2 ), therefore, q α (X) = x. Applying Corollary 1 we nd that 1/ p x (X) = (x x )/E[X x ] + for x 1 < x < x 2. Therefore, 1/ p x (X) is linear on x 1 < x < x 2. This way, all the atom probability intervals of type (F X (x ), F X (x )) [0, 1] will project into the intervals of type ( x 1 ; x 2 ) (EX, sup X) between corresponding superquantiles, covering all the interval (EX, sup X). Therefore, 1/ p x (X) is a piecewiselinear function on x (, sup X), and 1/ p x (X) = + on x [sup X, + ). 3.2. Properties of bpoe w.r.t. Random Variable Proposition 4. Buered probability is a closed quasi-convex function of random variable (w.r.t. addition operation), i.e., the set {X p x (X) p} is a closed convex set of random variables for any p R. Furthermore, for p [0, 1), p x (X) p q 1 p (X) x. Proof. If p 1, then the inequality p x (X) p holds for any x and X. Therefore, the level-set {X p x (X) p} is a closed convex set. For p < 0, {X p x (X) p} =. Consider p [0, 1). Suppose p x (X) p, then p x (X) = p ε for some ε 0. Then, either q 1 px(x)(x) = q 1 p+ε (X) = x, therefore, q 1 p (X) x, or sup X x, therefore, q 1 p (X) q 1 (X) x. Conversely, if q 1 p (X) x, then either q 1 p+ε (X) = x for some ε 0, or sup X x. In the rst case, p x (X) = p ε p, and p x (X) p q 1 p (X) x. If sup X x, then p x (X) = 0 p. Function q 1 p (X) is a closed convex function of X, therefore, the set {X q 1 p (X) x} is closed convex. Then, the set {X p x (X) p} is closed convex. 6
Example 1. Buered probability of exceedance is not a convex function of random variable (w.r.t. addition operation), i.e., in general, p x (λx + (1 λ)y ) λ p x (X) + (1 λ) p x (Y ). Counterexample is as follows. Take x = 0 and { 1, with probability 1/2, X = 1, with probability 1/2. Take Y 0, λ = 1/2. Note that p 0 (X) = 1, since q 0 (X) = 0, p 0 (Y ) = 0. Note also that λx + (1 λ)y = X/2, therefore, p 0 (λx + (1 λ)y ) = 1 1/2 = λ p 0 (X) + (1 λ) p 0 (Y ). Denote by B λ the Bernoulli random variable with probability λ being equal to 1, i.e., { 1, with probability λ, B λ = 0, with probability 1 λ. Denote the mixture of random variables with coecient λ as λx (1 λ)y = XB λ + Y (1 B λ ), where B λ is independent of X and Y. In words, a mixture of random veriables with coecient λ is a random variable which takes a value of the rst random variable with probability λ, and a value of the second random variable with probability (1 λ). Mixture operation results from the addition operation over measures. Suppose µ and ν are measures. Then, scaled measure λµ is a measure satisfying (λµ)(a) = λµ(a) for any measurable set A, λ R. Sum of measures µ + ν is a measure satisfying (µ + ν)(a) = µ(a) + ν(a) for any measurable set A. Random variable X denes measure µ X on (R, B(R)) such that µ X (A) = P (X A), for any A B(R). Conversely, any nonnegative measure µ X on (R, B(R)) such that µ X (R) = 1 denes a random variable. Suppose that random variables X and Y correspond to measures µ X and µ Y. Measure µ Z = λµ X + (1 λ)µ Y for λ [0, 1] is a nonnegative measure on (R, B(R)) and µ Z (R) = λµ X (R) + (1 λ)µ Y (R) = 1. Therefore, µ Z denes the random variable Z. We call Z a mixture of random variables X and Y with coecient λ and denote Z = λx (1 λ)y. In particular, F λx (1 λ)y (z) = λf X (z)+(1 λ)f Y (z), where F Z is a cumulative distribution function of the random variable Z. Proposition 5. (1 α) q α (X) is a concave function of (X, α) w.r.t. mixture operation and addition operation correspondingly, i.e., (1 (λα 1 + (1 λ)α 2 )) q (λα1 +(1 λ)α 2 )(λx (1 λ)y ) λ [(1 α 1 ) q α1 (X)] + (1 λ) [(1 α 2 ) q α2 (Y )]. 7
Proof. Denote α M = λα 1 +(1 λ)α 2. Then, with denitions of CVaR and λx (1 λ)y, we have (1 α M ) q αm (λx (1 λ)y ) = min c { (1 αm )c + E[B λ X + (1 B λ )Y c] +} = = min c { (1 αm )c + E[X c] + I(B λ = 1) + E[Y c] + I(B λ = 0) }. Since B λ is independent of X and Y, then E[X c] + I(B λ = 1) = E[X c] + EI(B λ = 1) = λe[x c] +. Then, { (1 α M ) q αm (λx (1 λ)y ) = min (1 αm )c + λe[x c] + + (1 λ)e[y c] +} { c min λ(1 α1 )c 1 + λe[x c 1 ] + + (1 λ)(1 α 2 )c 2 + (1 λ)e[y c 2 ] +} = c 1,c 2 = λ(1 α 1 ) q α1 (X) + (1 λ)(1 α 2 ) q α2 (Y ). The following statement is similar to a proposition in [2], which has motivated Proposition 5 in the rst place. Here we show how this proposition can be proved from Proposition 5 as a corollary. Corollary 6. Let X(x, p) be a discretely distributed random variable, taking values x = (x 1,..., x m ) with probabilities p = (p 1,..., p m ), p i 0, m i=1 p i = 1. Then function q α (X(x, p)) is a concave function of p. Proof. Note that if p M = λp 1 + (1 λ)p 2, then F X(x,pM )(x) = λf X(x,p1 )(x) + (1 λ)f X(x,p2 )(x). Therefore, X(x, p M ) = λx(x, p 1 ) (1 λ)x(x, p 2 ). Then Proposition 5 implies the concavity of q α (X(x, p)) w.r.t. vector p. The following proposition is similar to the one in [10]. proposition can be proved from Proposition 5 as a corollary. Here we show how this Corollary 7. Let random variable X p have a distribution F (x; p) = m i=1 p if i (x), where F i (x) for i = 1,..., m are the distribution functions, and p = (p 1,..., p m ) R m, p i 0, m i=1 p i = 1. Then function q α (X p ) is a concave function of p. Proof. Note that if p M = λp 1 + (1 λ)p 2, then F (x; p M ) = λf (x; p 1 ) + (1 λ)f (x; p 2 ). Therefore, X pm = λx p1 (1 λ)x p2. Then Proposition 5 implies the concavity of q α (X p ) w.r.t. vector p. Proposition 6. Buered probability of exceedance is a concave function of random variable w.r.t. mixture operation, i.e., p x (λx (1 λ)y ) λ p x (X)+(1 λ) p x (Y ), λ (0, 1). Proof. Suppose p x (X) = α 1 and p x (Y ) = α 2, then there are three possible cases. First, α 1 = α 2 = 1, then EX x and EY x, therefore, EλX (1 λ)y = λex + (1 λ)ey x, and p x (λx (1 λ)y ) = λ p x (X) + (1 λ) p x (Y ) = 1. Second, α 1 = α 2 = 0, then sup X x and sup Y x, and sup λx (1 λ)y max{sup X, sup Y } x. 8
Therefore, p x (λx (1 λ)y ) = λ p x (X) + (1 λ) p x (Y ) = 0. Third, in all other cases λα 1 + (1 λ)α 2 (0, 1). By Proposition 4, p x (X) p q 1 p (X) x for p (0, 1), therefore, p x (X) > p q 1 p (X) > x and, furthermore, p x (X) > p q 1 p (X) > x, sup X > x. Note also that p x (X) = p (0, 1) q 1 p (X) = x, sup X > x. Then, is equivalent to Note that p x (λx (1 λ)y ) λ p x (X) + (1 λ) p x (Y ) = λα 1 + (1 λ)α 2 (0, 1), q 1 (λα1 +(1 λ)α 2 )(λx (1 λ)y ) x, sup λx (1 λ)y > x. α 1 q 1 α1 (X) α 1 x. (5) If α 1 = 0, then 0 0. If α 1 = 1, then q 1 1 (X) = EX x. If α 1 (0, 1), then q 1 α1 (X) = x. Similarly, α 2 q 1 α2 (Y ) α 2 x. (6) Implying Proposition 5 and inequalities (5), (6), we get q λ(1 α1 )+(1 λ)(1 α 2 )(λx (1 λ)y ) λα 1 q 1 α1 (X) + (1 λ)α 2 q 1 α2 (Y ) λα 1 + (1 λ)α 2 λα 1x + (1 λ)α 2 x = x. λα 1 + (1 λ)α 2 Since α 1 > 0 or α 2 > 0, then sup X > x or sup Y > x, therefore sup λx (1 λ)y = max{sup X, sup Y } > x, which nishes the proof. It was mentioned that distribution functions are linear w.r.t. mixture operation: F λx (1 λ)y (x) = λf X (x) + (1 λ)f Y (x). Note that Proposition 6 proves that superdistribution functions are convex w.r.t. mixture operation: FλX (1 λ)y (x) λ F X (x) + (1 λ) F Y (x). Proposition 7. Buered probability of exceedance p x (X) is a monotonic function of random variable, i.e., p x (Y ) p x (Z) for Y Z almost surely. Proof. Suppose x sup Y and x sup Z. Then min a 0 E[aY +1] + min a 0 E[aZ +1] +, therefore, p x (Y ) p x (Z). Suppose x = sup Y, then p x (Y ) = 0 p x (Z). Suppose x = sup Z, then x sup Y, therefore, p x (Y ) = 0 = p x (Z). 4. Optimization Problems with bpoe 4.1. Two Families of Optimization Problems Denote program P(x), Denote program Q(α), P(x) : min p x (X) s.t. X X. Q(α) : min q α (X) s.t. X X. 9
For a set of random variables X, dene e X = inf X X EX, s X = inf sup X. X X Proposition 8. Let X 0 X be an optimal solution to P(x 0 ), where e X < x 0 s X. Then, X 0 is an optimal solution to Q(1 p x0 (X 0 )). Proof. Denote p = p x0 (X 0 ). Since x 0 > e X, then p x0 (X 0 ) < 1. If p x0 (X 0 ) = 0, then sup X 0 x 0, but x 0 s X, and sup X 0 s X by denition, therefore, sup X 0 = x 0 = s X. If 0 < p x0 (X 0 ) < 1, then, by Denition 1 of bpoe, q 1 p (X 0 ) = x 0. Suppose that X 0 is not an optimal solution to Q(1 p x0 (X 0 )), then there exists X X such that q 1 p (X ) < x 0. Since x 0 s X, then p > 0 and sup X x 0. Therefore, there exists p < p such that q 1 p (X ) = x 0, since q 1 p (X) is a continuous non-increasing function of p. There are two possible cases. First, if sup X = x 0, then p x0 (X ) = 0 < p, X 0 is not an optimal solution to P(x 0 ), contradiction. Second, if sup X > x 0, then p x0 (X ) = p < p, X 0 is not an optimal solution to P(x 0 ), contradiction. Two intervals for x 0 are not covered in Proposition 8. Note that for x 0 e X the optimal value for P(x 0 ) is 1, therefore, any feasible solution is an optimal solution. As for the interval x > s X, optimal value for P(x 0 ) is 0. If s X < sup X 0 x 0, then p x0 (X 0 ) = 0, and it is optimal for P(x 0 ), but q 1 (X 0 ) > s X, and it is not optimal for Q(1). Proposition 9. Let X 0 X be an optimal solution to Q(α 0 ). Then X 0 is an optimal solution to P( q α0 (X 0 )), unless sup X 0 > q α0 (X 0 ) and there exists X X such that 1. sup X = q α0 (X 0 ), 2. P (X = sup X ) 1 α 0. Proof. Denote x 0 = q α0 (X 0 ). First, suppose sup X 0 = x 0. Then p x0 (X 0 ) = 0, and X 0 is an optimal solution to P(x 0 ). Second, suppose that sup X 0 > x 0 and that exists X X such that p x0 (X ) < p x0 (X 0 ). Since x 0 = q α0 (X 0 ) and sup X 0 > x 0, then p x0 (X 0 ) = 1 α 0. Suppose sup X > x 0, then q α (X ) is strictly increasing on [0, 1 p x0 (X )]. Therefore, q α0 (X ) < q 1 px0 (X )(X ) = x 0, which implies that X 0 is not an optimal solution to Q(α 0 ), contradiction. Consequently, sup X = x 0. Suppose P (X = x 0 ) < 1 α 0. Then, q α (X ) is strictly increasing on [0, 1 P (X = x 0 )], and q α0 (X ) < x 0. Therefore, X 0 is not an optimal solution to Q(α 0 ), contradiction. Therefore, P (X = x 0 ) 1 α 0. Intuiton behind Proposition 9 is as follows. Note that X is also an optimal solution to Q(α 0 ). Therefore, we have two optimal solutions to right tail expectation minimization problem. The dierence between optimal solutions X and X 0 is that X is constant in its right 1 α 0 tail, and X 0 is not, since q α0 (X 0 ) < sup X 0. Proposition 9 implies that X is an optimal solution to P( q α0 (X 0 )), while X 0 is not. Which is a very natural risk-averse decision. This implies that, for certain problems, formulations of type P(x) provide more reasonable solutions than formulations of type Q(α). Corollary 8. Let X be a set of random variables, such that sup X = for all X X. Then, program families P(x), for x > e X, and Q(α), for 0 < α < 1, have the same set of optimal solutions. That is, if X 0 is optimal for P(x 0 ), then X 0 is optimal for Q(1 p x0 (X 0 )). Conversely, if X 0 is optimal for Q(α 0 ), then X 0 is optimal for P( q α0 (X 0 )). 10
Proof. Proposition 8 implies that if e X < x 0 s X =, then if X 0 is optimal for P(x 0 ), then X 0 is optimal for Q(1 p x0 (X 0 )). Note that since e X < x 0 < s X, then p x0 (X 0 ) (0, 1). Proposition 9 implies that if X 0 is optimal for Q(α 0 ), then X 0 is optimal for P( q α0 (X 0 )), unless exists X X such that sup X = q α0 (X 0 ). Which is impossible since sup X = > q α0 (X 0 ). Note that since α 0 (0, 1), then e X < q α0 (X 0 ) <. Assumption sup X = + for all X X in Corollary 8 might be too strong for some practical problems, where it is a common practice for all random variables to be dened on a nite probability space generated by system observations. Let us describe sets of optimal points (x, α) for problem families P(x) and Q(α). Dene f P (x) = min X X p x(x), f Q (α) = min X X q α(x). Then, sets of all optimal points of P(x) and Q(α) families are S P = {(x, α) f P (x) = 1 α}, S Q = {(x, α) f Q (α) = x}. Finally, reduced sets of optimal points are S P = {(x, α) S P e X x s X }, S Q = {(x, α) S Q x < s X } {(s X, 1)}. For any random variable X X there is a set S X = {(x, α) q α (X) = x}. Let us dene union of such sets for X X as S X = S X = {(x, α) exists X : q α (X) = x}. X X Naturally, we prefer random variables with superquantile as small as possible for a xed condence level and with condence level as big as possible for a xed superquantile value. Therefore, for the set S X we dene a Pareto front, which is often called an ecient frontier in nance, as follows: S X = {(x, α) S X x < x or α > α for all (x, α ) S X, (x, α ) (x, α)}. Proposition 10. S P S Q = S P = S Q = S X. Proof. Let us start with S P S Q = S X. Notice that (x, α) S P (x, α ) S X α α, (7) (x, α) S Q (x, α) S X x x. (8) Clearly, right sides of (7) and (8) hold for S X, which implies S X S P S Q. Suppose S X S P S Q, i.e., for some (x, α) S P S Q there exists (x, α ) S X such that x x, α α and (x, α ) (x, α). Notice that if (x, α) S P S Q, then (x, α ) S X α α and (x, α) S X x x. Then, x < x and α > α. Consider random variable X which has generated point (x, α ). Since q X (α ) < x, then q X (α) < x. Therefore, there exists ( q X (α), α) S X with q X (α) < x, while (x, α) S Q and (8) holds. Contradiction. Let us prove S P = S Q. Suppose (x, α) S P and x e X, then we can use Proposition 8 to conclude that (x, α) S Q. If x = e X, then p x (X) = 0, therefore, 11
(x, α) = (e X, 0) S Q. Let (x, α) S Q. If x < s X, then there is no X such that sup X = x, therefore, we can use Proposition 9 and conclude that (x, α) S P. Finally, let us prove S P S Q = S P. Since S P S P, S Q S Q and S P = S Q, then S P S P S Q. Suppose (x, α) S P S Q. If x < s X, then (x, α) S P. If x = s X, then α = 1, because there is only one α for any x in S P. Then (x, α) = (s X, 1) S P. Point with x > s X can not be in S Q since f Q (α) = min X X q α (X) min X X sup X = s X. Therefore, S P S Q S P, which nalizes the proof. 4.2. Parametric Simplex Algorithm for CVaR and bpoe Minimization Suppose that we are interested in solution to P 1 (x) for x s X. Suppose also we have an algorithm for solving P 2 (α), i.e. we can calculate function f 2 (α) for any α [0, 1]. Then we can nd an approximation to f 1 (x) by calculating f 2 (α) several times. First, calculate f 2 (1) = x X. If x > f 2 (1), then problem P 1 (x) is inecient. If x = f 2 (1), then f 1 (x) = 0. If x < f 2 (1), continue. Calculate f 2 (0). If x < f 2 (0), then P 1 (x) is infeasible. If x = f 2 (0), then f 1 (x) = 1. If x > f 2 (0), continue. Set a = 0, b = 1. Inequality 1 b < f 1 (x) < 1 a holds. We will calculate f 2 ((a + b)/2) at each step of the binary search procedure to make dierence b a as small as we need. Suppose that problem P 2 (α) can be expressed as a linear program. Let X 1,..., X n be a set of random variables discretely distributed on the common set of m scenarios, with scenario probabilities p 1,..., p m. Let random variable X i take value x j i under scenario j. Let λ = (λ 1,..., λ n ) be a set of decision variables such that X X X = n λ i X i, for some λ Λ, where Λ R n is a polyhedral set. Then P 2 (α) is equivalent to ( n ) min q α λ i X i λ i=1 i=1 s.t. λ Λ. With minimization form of CVaR, which is, q α (x) = min c { c + 1 1 α E[X c]+}, we reformulate P 2 (α) as min c,λ c + 1 1 α s.t. λ Λ. [ m n ] + p j λ i x j i c i=1 j=1 We will slightly adapt the parametric simplex method, see e.g. [8], [7], to solve this problem for all values α [0, 1]. To start, we need to obtain a basic feasible solution for one of extreme values, say α = 0. To get a solution for α = 0 we need to nd a random variable with minimal expectation. For example, if λ 0 and n i=1 λ i = 1, then we nd EX i = j p jx j i for all i and then take i = arg min i EX i. Then the optimal solution is λ 0 such that λ i = 1, λ j = 0 for j i. After obtaining the rst solution, denote µ = 1 and express reduced costs for 1 α nonbasic decision variables as linear functions of µ. Since the solution is optimal for 12
µ 0 = 1, all reduced costs are nonnegative at µ = 1. We do not have dependence on µ in constraints, that is why if µ is changing, solution remains primal feasible, but may become dual infeasible. Let us nd the biggest parameter value µ 1 at which reduced costs are still nonnegative. For µ 0 µ µ 1 solution λ 0 remains optimal. When µ > µ 1 some reduced costs became negative, that is why we make primal pivots until we nd new optimal solution λ 1. Then we express reduced costs as linear functions of µ, nd the next critical value µ 2 at which some costs reduce to 0, and so forth, see [8] for detailes. At some point all reduced costs will become nonnegative, no matter how big µ is, it means that the current solution is optimal for µ up to +, or up to α = 1. As a result of the algorithm we have: a sequence of parameters 1 = µ 0,..., µ M which corresponds to sequence α 0 = 0,..., α M = 1 1/µ M, α M+1 = 1; a sequence of optimal solutions λ 0,..., λ M ; a sequence of optimal objective values f 2(α i ) (f 2 (α M+1 ) = f 2 (α M )). To calculate f 1 (x), nd the interval [α j, α j+1 ] such that f 2 (α j ) x f 2 (α j+1 ). Then optimal solution is λ j = (λ j 1,..., λ j n) and f 1 (x) = p x ( n i=1 λj i X i). 4.3. Finite Probability Space Applications Proposition 11. Let X be a convex set of random variables. inf X X p x (X), there are two cases: Then for the problem 1. If inf X X sup X is attained for some X X, and x = min X X sup X = sup X, then min X X p x (X) = 0 with optimal solution X. 2. Problem inf X X p x (X) can be reformulated as a following problem: inf E[Y + Y Y 1]+, where Y = cl cone(x x) is a closed convex cone. Proof. If Case 1 is not valid, then with Proposition 1 we conclude that inf p x(x) = inf min E[a(X x) + X X X X a 0 1]+ = inf E[a(X x) + X X,a 0 1]+. Denote Y = a(x x). Since X is convex, then constraints X X, a 0 are equivalent to Y cone(x x). Suppose that sequence {Y i } i=1 cone(x x) converges weakly to Y Y = cl cone(x x). Since cone(x x) is a convex set, and for convex sets weak and L 1 convergences are equivalent, then sequence {Y i } i=1 is L 1 -converging to Y. Therefore, lim E[Y i + 1] + = E[Y + 1] +. i Then, nally, inf p x(x) = inf E[Y + X X Y Y 1]+. Denote Π m = {q = (q 1,..., q m ) T R m q i 0, i = 1,..., m; m i=1 q i = 1}. Denote Π m + = {q = (q 1,..., q m ) T R m q i > 0, i = 1,..., m; m i=1 q i = 1}. For the following proposition we suppose X to be a set of random variables dened on the common nite probability space with the vector of elementary events' probabilies p = (p 1,..., p m ) T Π m +. Denote by the set S R m the set of vectors such that any random 13
variable X X takes values x 1,..., x m with probabilities p 1,..., p m correspondingly, for some x = (x 1,..., x m ) T S. Then X being closed convex set is equivalent to S being a closed convex set. Let us say that random variable X takes values from x = (x 1,..., x m ) R m with probabilities p = (p 1,..., p m ) Π m if X is disctretely distributed over m atoms and takes value x i with probability p i for i = 1,..., m. Corollary 9. Let X be a set of random variables such that X X X takes values from x S with probabilities p, where S R m is a convex set, p Π m +. Then for the problem inf X X p x (X), there are two cases: 1. If inf x S max i x i is attained for some x S, and x = min x S max i x i = max i x i, then min X X p x (X) = 0 with optimal solution X taking values from x. 2. Problem inf X X p x (X) can be reformulated as a following problem: inf y C pt [y + e] +, where C = cl cone(s xe) is a closed convex cone, e = (1,..., 1) T R m. Proof. Let us apply Proposition 11 to the specic case of nite probability space. If we consider the Case 2, then problem inf X X p x (X) can be reformulated inf y C pt [y + e] +, where C = cl cone(s xe) is a closed convex cone, since S is convex. Corollary 10. Let X be a set of random variables such that X X X takes values from x S with probabilities p, where S = {x Ax b} R m, p Π m +. Then for the problem inf X X p x (X), there are two cases: 1. If x = min x S max i x i, then min X X p x (X) = 0 with optimal solution X taking values from x such that max i x i = x. 2. Problem inf X X p x (X) can be reformulated as an LP: inf p T z (9) s.t. z y + e, (10) Ay a(b xae) 0, (11) z 0, a 0. (12) Proof. Corollary 9 implies that for the Case 2 problem inf X X p x (X) can be reformulated as inf y C p T [y + e] +, with C = cl cone(s xe). Note that S xe = {x A(x + xe) b} = {x Ax b xae}. 14
Note also that cone(s xe) = {ax Ax b xae, a > 0} {0} = {y Ay a(b xae), a > 0} {0}. Therefore, cl cone(s xe) = {y Ay a(b xae), a 0} {0} = {y Ay a(b xae), a 0}. Finally, introducing z = [y + e] +, we obtain reformulation (9)(12). Consider random real-valued function f(w; X), where w R k and X T = (X 1,..., X n ) is a random vector of dimension n. It is assumed here that variables X 1,..., X n can be observed, but can not be controlled. It is also assumed that value f(w; X) can be controlled by the vector w W R k. Proposition 12. Let f(w; X) be a convex function of w. Then p x (f(w; X)) is a quasiconvex function of w. Proof. Convexity of f implies f(w M ; X) λf(w 1 ; X) + (1 λ)f(w 2 ; X), for w M = λw 1 + (1 λ)w 2. Then, using monotonicity of p x (X), see Proposition 7, p x (f(w M ; X)) p x (λf(w 1 ; X) + (1 λ)f(w 2 ; X). p x (X) is a quasi-convex function of X, see Proposition 4. Quasi-convexity of a function p x (X) is equivalent to p x (λx 1 + (1 λ)x 2 ) max{ p x (X 1 ), p x (X 2 )}. Then, Therefore, p x (λf(w 1 ; X) + (1 λ)f(w 2 ; X)) max{ p x (f(w 1 ; X)), p x (f(w 2 ; X))}. p x (f(w M ; X)) max{ p x (f(w 1 ; X)), p x (f(w 2 ; X))}, i.e., p x (f(w; X)) is a quasi-convex function of w. Proposition 13. Let X be a random vector and let f(w; X) be a convex positive-homogeneous function of w. Assume that convergence of {w i } implies L 1 -convergence of {f(w i ; X)}. Let W be a convex set. Then for the problem inf w W p x (f(w; X)) there are two possible cases: 1. If inf w W sup f(w; X) is attained for some w W, and x = min w W sup f(w; X) = sup f(w ; X), then min w W p x (f(w; X)) = 0 with optimal solution w. 2. Problem inf w W p x (f(w; X)) can be reformulated as a convex programming problem: inf E[ f(v; X) + 1] +, v V where v T = (v 1,..., v k+1 ) R k+1, f(v; X) = f((v1,..., v k ) T ; X) v k+1, and V = cl cone(w {x}) is a closed convex cone. 15
Proof. In the Case 2 we can reformulate problem inf w W p x (f(w; X)) as follows: inf p x(f(w; X)) = inf E[a(f(w; X) x) + w W a 0,w W 1]+. Denote v T = (v 1,..., v k+1 ) R k+1 and f(v; X) = f((v 1,..., v k ) T ; X) v k+1. Then, inf p x(f(w; X)) = inf E[a f(v; X) + 1] +. w W a 0,v W {x} Note that f(v; X) is also a convex positive-homogeneous function of v. Then, inf p x(f(w; X)) = inf E[ f(av; X) + 1] +. w W a 0,v W {x} Note that since W is a convex set, then W {x} is also a convex set. Therefore, a 0, v W {x} av cone(w {x}). Note that feasible region can be extended to V = cl cone(w {x}): since convergence of {w i } implies L 1 -convergence of f(w; X), then convergence of v i v implies L 1 - convergence of f(v i ; X) L1 f(v; X), therefore, E[ f(v i ; X) + 1] + E[ f(v; X) + 1] +. Finally, inf p x(f(w; X)) = inf E[ f(v; X) + 1] +. w W v V Corollary 11. Let X = (X 1,..., X n, 1) T be a random vector, with the last component being a constant 1, and E X i < for i = 1,..., n. Let W R n+1 be a convex set. Then for the problem inf w W p x (w T X) there are two possible cases as follows: 1. If inf w W sup w T X is attained for some w W, and x = min w W sup w T X = sup(w ) T X, then min w W p x (w T X) = 0 with optimal solution w. 2. Problem inf w W p x (w T X) can be reformulated as the following convex programming problem: inf v V E[vT X + 1] +, where V = cl cone(w xe n+1 ) is a closed convex cone, where e n+1 = (0,..., 0, 1) T R n+1. Proof. Let us show that this corollary follows from Proposition 13. First, f(w; X) = n i=1 w ix i + w n+1 is convex and positive-homogeneous w.r.t. w. Second, suppose that w j w. Then E (w j ) T X w T X w j n+1 w n+1 + n w j i w i E X i 0, i=1 since E X i <. Therefore, convergence of w implies L 1 -convergence of f(w; X). Note that in this particular case of function f there is no need to introduce a new parameter. It is sucient to shift feasible region for w n+1 by x: W = W xen+1. Further change of variables v = aw, and setting V = cl cone( W ), as it is done in Proposition 13, nalizes the proof. 16
Corollary 12. Let X = (X 1,..., X n, 1) T be a random vector, with the last component being a constant 1, and E X i < for i = 1,..., n. Let W = {w Aw b} R n+1. Then for the problem inf w W p x (w T X) there are two possible cases: 1. If inf w W sup w T X is attained for some w W, and x = min w W sup w T X = sup(w ) T X, then min w W p x (w T X) = 0 with optimal solution w. 2. Problem inf w W p x (w T X) can be reformulated as the linear programming problem: inf E[v T X + 1] +, (13) s.t. Av T a(b xae n+1 ) 0, (14) a 0. (15) Proof. Let us prove that this corollary follows from Corollary 11. Note that Note further that W xe n+1 = {w Aw + xae n+1 b}. cone(w xe n+1 ) = {v Av + axae n+1 ab, a > 0} {0}. Finally, V = cl cone(w xe n+1 ) = {v Av a(b xae n+1 ) 0, a 0}. Corollary 13. Let X be a random vector taking values x 1,..., x m R n with probabilities p = (p 1,..., p m ) Π m +. Let f(w; X) be a convex positive-homogeneous function of w R k. Let W R k be a convex set. Then for the problem inf w W p x (f(w; X)) there are two possible cases: 1. If inf w W max j f(w; x j ) is attained for some w W, and x = min w W max j f(w; x j ) = max j f(w ; x j ), then min w W p x (f(w; X)) = 0 with optimal solution w. 2. Problem inf w W p x (f(w; X)) can be reformulated as the convex programming problem: inf v V m p j [ f(v; x j ) + 1] +, j=1 where v T = (v 1,..., v k+1 ) R k+1, f(v; X) = f((v1,..., v k ) T ; X) v k+1, and V = cl cone(w {x}) is a closed convex cone. Proof. Note that since there are nitely many scenarios for random vector X, then for w i w, due to continuity of function f w.r.t. w, we have max j f(w i ; x j ) f(w; x j ) 0. That is, convergence on w implies L 1 -convergence of f(w; X). Therefore, this corollary follows directly from Proposition 13. 5. References [1] Norton, M., and Uryasev, S. AUC and Buered AUC Maximization. University of Florida, Research Report, in preparation, 2014. [2] Pavlikov, K., and Uryasev, S. CVaR Distance between Distributions and Applications. University of Florida, Research Report, in preparation, 2014. 17
[3] Rockafellar, R. T., and Royset, J. O. Random variables, monotone relations and convex analysis. Mathematical Programming B, accepted. [4] Rockafellar, R. T., and Royset, J. O. On buered failure probability in design and optimization of structures. Reliability Engineering and System Safety 95, 5 (2010), 499 510. [5] Rockafellar, R. T., and Uryasev, S. Conditional value-at-risk for general loss distributions. Journal of Banking and Finance (2002), 14431471. [6] Rockafellar, R. T., and Uryasev, S. The Fundamental Risk Quadrangle in Risk Management, Optimization and Statistical Estimation. Surveys in Operations Research and Management Science 18 (2013). [7] Ruszczynski, A., and Vanderbei, R. J. Frontiers of Stochastically Nondominated Portfolios. Econometrica 71, 4 (2003), 12871297. [8] Vanderbei, R. J. Linear Programming: Foundations and Extensions. International series in operations research & management science. Kluwer Academic, 2001. [9] Zabarankin, M., and Uryasev, S. Statistical decision problems. Selected concepts and portfolio safeguard case studies. Springer Optimization and Its Applications 85. New York, NY: Springer. [10] Zdanovskaya, V., Pavlikov, K., and Uryasev, S. Estimation of Mixtures of Continuous Distributions: Mixtures of Normal Distributions and Applications. University of Florida, Research Report, in preparation, 2013. 18