# Privacy Tradeoffs in Predictive Analytics

Save this PDF as:

Size: px
Start display at page:

## Transcription

6 Definition 1. We say that R = (L, Y, x) is privacy preserving if the obfuscated output Y is independent of x 0. Formally, for all x R d, V R (d+1) 0, and A Y, P ( 1,x),V ( Y (r, 1,l) A ) =P(+1,x),V ( Y (r,+1,l) A ), (8) where l = L(V) is the information disclosed from V, and r R S is the vector of user ratings. Intuitively, a learning protocol is privacy-preserving if its obfuscation scheme reveals nothing about the user s private variable: the distribution of the output Y does not depend statistically on x 0. Put differently, two users that have the same x, but different x 0, output obfuscated values that are computationally indistinguishable [17]. Computational indistinguishability is a strong privacy property, as it implies a user s private variable is protected against any inference algorithm (and not just, e.g., the LSE (6)): in particular, no inference algorithm can estimate x 0 with probability better than 50% with access to y alone. Accuracy. Our second definition determines a partial ordering among learning protocols w.r.t. their accuracy, as captured by the l 2 loss of the estimation: Definition 2. We say that a learning protocol R = (L, Y, x) is more accurate than R = (L, Y, x ) if, for all V R d+1 0, 2 sup E (x0,x),v{ x(y,v) x 2} sup E (x0,x),v{ x (y,v) x 2 2}, x 0 {0,1} x 0 {0,1} x R d x R d where y = Y (r, x 0, L(V)), y = Y (r, x 0, L (V)). Further, we say that it is strictly more accurate if the above inequality holds strictly for some V R d+1 0. Note that the accuracy of R is determined by the l 2 loss of the estimate x computed in a worst-case scenario, among all possible extended user profiles x = (x 0, x). This metric is natural. As we discuss in Section 6.2, it relates to the so-called A-optimality criterion [6]. It also has an additional compelling motivation. Recall that x is used to estimate the rating for a new item through the inner product (3). An estimator x minimizing the expected l 2 loss also minimizes the mean square prediction error over a new item. This further motivates this accuracy metric, given that the analyst s goal is correct rating prediction. To see this, assume that the extended user profile is estimated as x = ( x 0, x) for some x 0 (for brevity we omit the dependence on y, V). Recall that the analyst uses this profile to predict ratings for v / V using r = v, x. The quality of such a prediction is often evaluated in terms of the mean square error (MSE): MSE = E{(r r) 2 } (7) = σ 2 +E{ v, (x x) 2 }. Assuming a random item vector v with diagonal covariance E(v 2 0) = c 0, E(v 0v) = 0, E(vv T ) = ci, we get MSE = σ 2 + c 0E{(x 0 x 0) 2 } + ce { x x 2 2}. Observe that the first term is independent of the estimation. Under a privacy-preserving protocol, the value for x 0 that minimizes the second term is 0.5, also independent of the estimation. The last term is precisely the l 2 loss. Hence, minimizing the mean square error of the analyst s prediction is equivalent to minimizing the l 2 loss of the estimator x. This directly motivates our accuracy definition. Algorithm 1 Midpoint Protocol Analyst s Parameters S [M], V = {(v j0, v j ), j S} R d+1 0 User s Parameters x 0 { 1, +1}, r = (r 1,..., r S ) R S DISCLOSURE: l = L(V) l j = v j0, for all j S OBFUSCATION SCHEME: y = Y (r, x 0, l) y j = r j x 0 l j, for all j S ESTIMATOR: x = x(y, V) Apply the minimax estimator x given by (9). Disclosure Extent. Finally, we define a partial ordering among learning protocols w.r.t. the amount of information revealed by their disclosures. Definition 3. We say that R = (L, R, x) discloses at least as much information as R = (L, Y, x ) if there exists a measurable mapping ϕ : L L such that L = ϕ L i.e., L (v) = ϕ(l(v)) for each v R d+1 0. We say that R and R disclose the same amount of information if L = ϕ L and L = ϕ L for some ϕ, ϕ. Finally, we say that R discloses strictly more information than R if L = ϕ L for some ϕ but there exists no ϕ such that L = ϕ L. The above definition is again natural. Intuitively, a disclosure L carries at least as much information as L if L can be retrieved from L: the existence of the mapping ϕ implies that the user can recover L from L by applying ϕ to the disclosure L(V). Put differently, having a black box that computes L, the user can compute L by feeding the output of this box to ϕ. If this is the case, then L is clearly at least as informative as L. 5. AN OPTIMAL PROTOCOL In this section we prove that a simple learning protocol outlined in Algorithm 1, which we refer to as the midpoint protocol (MP), has remarkable optimality properties. The three components (L, Y, x) of MP are as follows: 1. The analyst discloses the entry v 0 corresponding to the private user feature x 0, i.e., L ( (v 0, v) ) v 0 for all (v 0, v) R d+1 0, and L R. 2. The user shifts each rating r j by the contribution of her private feature. More specifically, the user reveals to the analyst the quantities: y j = r j x 0 l j = r j x 0 v j0, j S. The user s obfuscated feedback is thus Y (r, x 0, l) y, where vector y s coordinates are the above quantities, i.e., y = (y 1,..., y S ), and Y R S. Note that, by (7), for every j S the obfuscated feedback satisfies y j = v j, x + ε j, with ε j the i.i.d. zero-mean noise variables. 3. Finally, the analyst applies a minimax estimator on the obfuscated feedback. Let X be the set of all measurable mappings x estimating x given y and V (i.e., of the form x : R S (R (d+1) 0 ) S R d ). Estimator x X is

7 minimax if it minimizes the worst-case l 2 loss, i.e.: sup E (x0,x),v{ x (y,v) x 2 2} = x 0 {0,1},x R d inf sup x X x 0 {0,1},x X E (x0,x),v{ x(y, V) x 2 2}. The following theorem summarizes the midpoint protocol s remarkable properties: Theorem 1. Under the linear model (7): 1. MP is privacy preserving. 2. No privacy preserving protocol is strictly more accurate than MP. 3. Any privacy preserving protocol that does not disclose at least as much information as MP is strictly less accurate. We prove the theorem below. Its second and third statement establish formally the optimality of the midpoint protocol. Intuitively, the second statement implies that the midpoint protocol has maximal accuracy. No privacy preserving protocol achieves better accuracy: surprisingly, this is true even among schemes that disclose strictly more information than the midpoint protocol. As such, the second statement of the theorem imples there is no reason to disclose more than v j0 for each item j S. The third statement implies that the midpoint protocol engages in a minimal disclosure: to achieve maximal accuracy, a learning protocol must disclose at least v j0, j S. In fact, our proof shows that the gap between the accuracy of MP and a protocol not disclosing biases is infinite, for certain V. We note that the disclosure in MP is intuitively appealing: an analyst need only disclose the gap between average ratings across the two types (e.g., males and females, conservatives and liberals, etc.) to enable protection of x 0. In general, the minimax estimator x depends on the distribution followed by the noise variables in (7). For arbitrary distributions, a minimax estimator that can be computed in a closed form (rather than as the limit of a sequence of estimators) may not be known. General conditions for the existence of such estimators can be found, e.g., in Strasser [42]. In the case of Gaussian noise, the minimax estimator coincides with the least squares estimator (see, e.g., Lehman and Casella [27, Thm. 1.15, Chap. 5]), i.e., { x S (y, V)=arg min x R d j=1( yj v ) } 2 j,x. (10) The minimization in (10) is a linear regression, and x has the following closed form: x (y, V) = ( ) 1 j S vjvt j ( ) j S yjvj. (11) The accuracy of this estimator can also be computed in a closed form. Using, (7), (11), and the definition of y, it can easily be shown that, for all x R d, E (x0,x),v{ x (y, V) x 2 2}=σ 2 tr [( ) 1 ] j S vjvt j, (12) where σ 2 the noise variance in (7) and tr( ) the trace. 5.1 Proof of Theorem 1 Privacy. To see that Thm. 1.1 holds, observe that the user (7) releases y j = r j v 0jx 0 = v j, x + ε j, for each j S. The distribution of y j thus does not depend on x 0, so the midpoint protocol is clearly privacy preserving. (9) Maximal Accuracy. We prove Theorem 1.2 by contradiction; in particular, we show that a protocol that is strictly more accurate can be used to construct an estimator that has lower worst-case l 2 loss than the minimax estimator. Suppose that there exists a privacy preserving protocol R = (L, Y, x ) that is strictly more accurate than the midpoint protocol R = (L, Y, x). Let l = L(V), l = L (V) be the disclosures under the two protocols, and y = Y (r, x 0, l), y = Y (r, x 0, l ) the obfuscated values. Recall that l j = v j0, and y j = r j x 0v 0j = v j, x + ε j, j S. We will use L, Y and x to construct an estimator x that has a lower l 2 loss than the least squares estimator x over y and V. First, apply Y to y + l, assuming that the private variable is x 0 = +1, using the disclosed information l. That is: y = Y (y + l, +1, l ). Second, apply the estimator x to this newly obfuscated output y, i.e.: x (y, V) Combining these two the estimator x is given by x (y, V) = x ( Y ( y + l, +1, L (V) ), V ) Under this construction, the random variables y, y are identically distributed. This is obvious if x 0 = +1; indeed, in this case y = y. On the other hand, since R is privacy preserving, by (8): Y (y + l, +1, l ) d = Y (y l, 1, l ), (13) i.e., the two variables are equal in distribution. This implies that x (y, V) is identically distributed as x (y, V). On the other hand, R is strictly more accurate than R; hence, there exists a V such that sup x a contradiction. E{ x(y, V) x 2 2} > sup E{ x (y, V) x 2 2} x = sup E{ x (y, V) x 2 2}, x Minimal Disclosure. Consider a privacy preserving learning protocol R = (L, Y, x ) that does not disclose at least as much information as the midpoint protocol R = (L, Y, x). Consider a setup where S = d, the dimension of the feature profiles. Assume also that V is such that the matrix V = [v j] j S R d d is invertible, and denote by l = L(V) R d the vector with coordinates v j0, j S. For any x 0 {+1, 1}, s R d, and l (L ) d, let Z x0 (s, l ) Y be a random variable with distribution given by Z x0 (s, l ) d = Y (s + ε, x 0, l ), where ε R d a vector of i.i.d. coordinates sampled from the same distribution as the noise variables ε j, j S. Put differently, Z x0 (s, l ) is distributed as the output of obfuscation Y when r ε = V x + x 0l = s R d, L (V) = l, and the gender is x 0. The following then holds. Lemma 1. If V R d d is invertible then, for all s R d, l = L(V), and l = L (V), Z +(s, l ) d = Z (s 2l, l ). Proof. By Eq. (13), for all x R d, Y (V x + l + ε, +1, l ) d = Y (V x l + ε, 1, l ). The claim follows by taking x = V 1 (s l). As R does not disclose as much information as the midpoint protocol, by definition, there is no map ϕ such that L(v) = ϕ(l (v)) for all v = (v 0, v) R d+1 0. Hence, there

8 exist extended profiles v, v R d+1 0 such that v0 v 0 and yet L (v) = L (v ). As both v = (v 0, v), v = (v 0, v ) belong to R d+1 0, the supports of v, v are non-empty. We consider the following two cases: Case 1. The supports of v, v intersect, i.e., there exists a k [d] such that v k 0 and v k 0. In this case, consider a scenario in which V = {v} 1 l d,l k, {e l}, where e l R d+1 0 a vector whose l-th coordinate is 1 and all other coordinates are zero. Clearly, S = V = d, and V = [v i] i [d] is invertible. Let l = L (V). By Lemma 1, for all s R, Z +(s + 2v 0e 1, l ) d = Z (s, l ), (14) where e 1 R d is 1 at coordinate 1 and 0 everywhere else. Similarly, in a scenario in which V = {v } 1 l d,l k, {e l}, V is again invertible. Crucially L (V ) = L(V) = l, so again by Lemma 1: Z +(s + 2v 0e 1, l ) d = Z (s, l ), (15) for all s R d. Equations (14),(15) imply that, for all s R d : Z +(s + ξe 1, l ) d = Z +(s, l ) (16) where ξ 2(v 0 v 0). In other words, the obfuscation is periodic with respect to the direction e 1. Observe that for any x { 1, +1} R d and any M R +, we can construct a x { 1, +1} R d and a K N such that (a) x,x differ only at coordinate k {1, 2,..., d}, (b) v, x x = Kξ, and (c) x x 2 M. To see this, let K be a large enough integer such that K ξ v k > M. Taking, x k = x k + Kξ/v k, and x l = x l for all other l in {0, 1,..., d} yields a x that satisfies the desired properties (a) and (b). Suppose that the learning protocol R is applied to V = {v} 1 l d, k {e l} for a user with x 0 = +1. Fix a large M > 0. For each x and x constructed as above, by (16), the obfuscated values generated by Y have an identical distribution. Hence, irrespectively of how the estimator x is implemented, either E x,v{ x (y, V) x 2 2} or E x,v{ x (y, V) x 2 2} must be Ω(M 2 ) which, in turn, implies that sup x {±1} R d E x,v{ x (y, V) x 2 2 =. Note that, since v j, j S, are linearly independent, the matrix j S vjvt j is positive definite and thus invertible. Hence, in contrast to the above setting, the loss (12) of MP in the case of Gaussian noise is finite. Case 2. The supports of v, v are disjoint. In this case v, v are linearly independent and, in particular, there exist 1 k, k d, k k, such that v k 0, v k = 0 while v k = 0, v k 0. Let V = {v} {v } 1 l d,l k,l k {e l }. Then, V = d and the matrix V = [v j] j S is again invertible. By swapping the positions of v and v in matrix V we can show using a similar argument as in Case 1 that for all s R d : Z +(s + ξ(e 1 e 2), l ) d = Z +(s, l ) (17) where ξ 2(v 0 v 0) and l = L(V). I.e., Z + is periodic in the direction e 1 e 2. Moreover, for any x { 1, +1} R d and any M R +, we can similarly construct a x { 1, +1} R d and a K N such that (a) x,x differ only at coordinates k, k {1, 2,..., d}, and (b) v, x x = v, x x = Kξ, and (c) x x 2 M: the construction adds Kξ/v k at the k-th coordinate and subtracts Kξ/v k from the k -th coordinate, where K > M max(v k, v k )/ξ. A Algorithm 2 Midpoint Protocol with Sub-Sampling Analyst s Parameters S [M], V = {(v j0, v j ), j S} R d+1 0 p = {(p x 0 1,..., px 0 S ), x 0 {, +}} ([0, 1] [0, 1]) S User s Parameters x 0 { 1, +1}, S 0 S, r = {r j, j S 0 } R S 0 DISCLOSURE: l = L(V, p) ρ j = p j /p+ j, for all j S l j = (v j0, ρ j ), for all j S OBFUSCATION SCHEME: S R=S R (S 0,x 0,l) y=y (r SR,x 0,l) S R =, y = for all j S do if j S 0 then b j Bern ( min ( 1, (ρ j ) x 0 )) if b j = 1 then S R = S R {j} y = y {r j x 0 v j0 } end if end if end for ESTIMATOR: x = x(y, { (S R, y), V) Solve x = arg min x R d j SR( yj v j, x ) } 2 similar argument as in Case 1 can be used to show again that the estimator x cannot disambiguate between x, x over V, yielding the theorem. 6. EXTENSIONS We have up until now assumed that the analyst solicits ratings for a set of items S [M], determined by the analyst before the user reveals her feedback. In what follows, we discuss how our analysis can be extended in the case where the user only provides ratings for a subset of these items. We also discuss how the analyst should select S, how a user can repeatedly interact with the analyst, and, finally, how to deal with multiple binary and categorical features. 6.1 Partial Feedback There are cases of interest where a user may not be able to generate a rating for all items in S. This is especially true when the user needs to spend a non-negligible effort to determine her preferences (examples include rating a featurelength movie, a restaurant, or a book). In these cases, it makes sense to assume that a user may readily provide ratings for only a set S 0 S (e.g., the movies she has already watched, or the restaurants she has already dined in, etc.). Our analysis up until now applies when the user rates an arbitrary set S selected by the analyst. As such, it does not readily apply to this case: the set of items S 0 a user rates may depend on the private feature x 0 (e.g., some movies may be more likely to be viewed by men or liberals). In this case, x 0 would be be inferable not only from the ratings she gives, but also from which items she has rated. In this section, we describe how to modify the midpoint protocol to deal with this issue. Intuitively, to ensure her privacy, rather than reporting obfuscated ratings for all items she rated (i.e., set S 0), the user can reveal ratings only for a subset S R of S 0. This sub-sampling of S 0 can be done so that S R has a distribution that is independent of x 0, even

9 though S 0 does not. Moreover, to ensure a high estimation accuracy, the user ought to ensure S R is as large as possible, subject to the constraint S R S 0. Model. Before we present the modified protocol, we describe our assumption on how S 0 is generated. For each j [M], denote by p + j, p j the probabilities that a user with private feature +1 or 1, respectively, has rated item j M. Observe that, just like the extended profiles v j, this information can be extracted from a dataset comprising ratings by non privacy-conscious users. Let p = [(p + j, p j )]j S ([0, 1] [0, 1]) S be the vector of pairs of probabilities. We assume that the privacy-conscious user the has rated items in the set S 0 S, whose distribution is given by the product form 3 : P x,v,p(s 0 = A) = p x 0 j (1 p x 0 j ), for all A S. (18) j S j S\A Put differently, items j S are rated independently, each with a probability p x 0 j. Conditioned on S0, we assume that the user s ratings r j, j S 0, follow the linear model (7) with Gaussian noise. Note that the distribution of S 0 depends on x 0: e.g., items j for which p + j > p j are more likely to be rated when x 0 = +1. Midpoint Protocol with Sub-Sampling. We now present a modification of the midpoint protocol, which we refer to as the midpoint protocol with sub-sampling (MPSS). MPSS is summarized in Algorithm 2. First, along with the disclosure of the biases v j0, j S, the analyst also discloses the ratios ρ j p j /p+ j, for j S. Having access to this information, the user sub-samples items from S 0; each item j S 0 is included in the revealed set S R independently with probability: P x,v,p(i S R j S 0) = min (1, (ρ j) x 0 ). (19) Having constructed S R, the user reveals ratings for j S R after subtracting x 0v j0, as in MP. Finally, the analyst estimates x through a least squares estimation over the obfuscated feedback, as in MP in the case of Gaussian noise. To gain some intuition behind the selection of the set S R, observe by (18) and (19) that, for any j S, P x,v,p(j S R)=p x 0 j min ( 1, (p j /p + j ) x 0 ) =min(p + j, p j ). (20) This immediately implies that MPSS is privacy preserving: both the distribution of S R and of the obfuscated ratings y do not depend on x 0. In fact, it is easy to see that since S R S 0 any privacy preserving protocol must satisfy P x,v,p(j S R) min(p + j, p j ): indeed, if for example p+ j < p j, then a user rating j with probability higher than p + j must have x 0 = 1 (see our technical report [20] for a formal proof of this statement). As such, MPSS reveals ratings for a set S R of maximal size, in expectation. This intuition can be used to establish the optimality of MPSS among a wide class of learning protocols, under (7) (with Gaussian noise) and (18). We can again show that it attains optimal accuracy. Moreover, it also involves a minimal disclosure: a protocol that does not reveal the ratios ρ j, j S, necessarily rates strictly fewer items than MPSS, in expectation. We provide a formal proof of these statements, 3 Here, we slightly abuse notation, e.g., denoting with p x 0 j the parameter p + j when x0 = +1. as well as a definition of the class of protocols we consider, in our technical report [20]. 6.2 Item Set Selection Theorem 1 implies that the analyst cannot improve the prediction of the private variable x 0 through its choice of S, under the midpoint protocol. In fact, the same is true under any privacy-preserving learning protocol: irrespectively of the analyst s choice for S, the obfuscated feedback y will be statistically independent of x 0. The analyst can however strategically select S to effect the accuracy of the estimate of the non-private profile x. Indeed, the analyst should attempt to select a set S that maximizes the accuracy of the estimator x. In settings where least squares estimator is minimax (e.g., when noise is Gaussian), there are well-known techniques for addressing this problem. Eq. (12) implies that it is natural to select S by solving Maximize: subject to: F (S) = tr [( ) 1 ] j S vjvt j S B, S [M], (21) where B is the number of items for which the analyst solicits feedback. The optimization problem (21) is NP-hard, and has been extensively studied in the context of experimental design (see, e.g., Section 7.5 of [6]). The objective function F is commonly referred to as the A-optimality criterion. Convex relaxations of (21) exist when S is a multiset, i.e., when items with the same profile can be presented to the user multiple times, each generating an i.i.d. response [6]. When such repetition is not possible, constant approximation algorithms can be constructed based on the fact that F is increasing and submodular (see, e.g., [16]). In particular, given any set S [M] of items whose profiles are linearly independent, there exists a polynomial time algorithm for maximizing F (S S ) F (S ) subject to S B within a 1 1 approximation factor [33]. e 6.3 Repeated Interactions Our analysis (and, in particular, the optimality of our protocols) persists even when the user repeatedly interacts with the analyst. In particular, the user may return to the service multiple times, each time asked to rate a different set of items S (k) \ [M], k 1. The selection of the set S (k) could be adaptive, i.e., depend on the obfuscated feedback the user has revealed up until the k 1-th time. For each k, the analyst again would apply MP (or MPSS), again disclosing the same information for each S (k), the only difference being that the estimator x would be applied to all revealed obfuscated ratings y (1),..., y (k). This repeated interaction is still perfectly private: the joint distribution of the obfuscated outputs y (k) does not depend on x 0. Moreover, each estimation remains maximally accurate at each interaction, while each disclosure is again minimal. 6.4 Categorical Features We discuss below how to express categorical features as multiple binary features through binarization, and illustrate how to incorporate both cases in our analysis. The standard approach to incorporating a categorical feature x 0 {1, 2,..., K} in matrix factorization is through categoryspecific biases (see, e.g., [25]), i.e., (7) is replaced by r = x, v j + b x 0 j + ε j, j [M] (22)

10 where b k j R, k [K] are category-dependent biases. Consider a representation of x 0 [K] as a binary vector x 0 { 1, +1} K whose x 0-th coordinate is +1, and all other coordinates are 1. I.e., the coordinate x 0k at k [K] is given +1 if k = x0 and 1 o.w. For k [K], let b jk b k j /2 and define µ j k [K] bk j /2. Then, observe that (22) is equivalent to r = x, v j + x 0k b jk + µ j + ε j, k [K] = x, v j + k [K] x 0k b jk + ε j, j [M], (23) where x = (x, 1) R d+1 and v j = (v j, µ j) R d+1. Hence, a categorical feature can be incorporated in our analysis as follows. First, given a dataset of ratings by nonprivacy conscious users that reveal their categorical feature x 0 [K], the analyst first binarizes this feature, constructing a vector x 0 { 1, 1} K for each user. It then performs matrix factorization on the ratings using (23), learning vectors v j R d+1, and biases b j = (b jk ) k [K] R K. A privacy-conscious user subsequently interacts with the analyst using the standard scheme as follows. The analyst discloses the biases b j for each j S, and the user reveals y j = r j k [K] x 0kb jk, j S, where x 0k is her binarized categorical feature. Finally, the analyst infers x through linear regression over the pairs (y j, v j), j S. 7. EVALUATION In this section we evaluate our protocols on real-world datasets. Our experiments confirm that MP and MPSS are indeed able to protect the privacy of users against inference algorithms, including non-linear algorithms, with little impact on prediction accuracy. 7.1 Experimental Setup We study two types of datasets, sparse and dense. In sparse datasets, the set of items a user rates is often correlated with the private feature. This does not happen in dense datasets because all users rate all (or nearly all) items. Datasets. We evaluate our methods on three datasets: Politics-and-TV, Movielens and Flixster. Politics-and-TV (PTV) [39] is a ratings dataset that includes 5-star ratings of users to 50 TV-shows and, in addition, each user s political affiliation (Democrat or Republican) and gender. To make it dense, we consider only users that rate over 40 items, resulting in 365 users; 280 provide ratings to all 50 TV shows. Movielens 4 and Flixster 5 [21] are movie recommender systems in which users rate movies from a catalog of thousands of movies. Both Movielens and Flixster datasets include user gender. Movielens also includes age groups; we categorize users as young adults (ages 18 35), or adults (ages 35 65). We preprocessed Movielens and Flixster to consider only users with at least 20 ratings, and items that were rated by at least 20 users. Table 3 summarizes the statistics of these three datasets. Methodology. Throughout the evaluation, we seek to quantify the privacy risk to a user as well as the impact of obfuscation on the prediction accuracy. To this end, we perform fold cross-validation as follows. We split the users in each dataset into 10 folds. We use 9 folds as a training set (serving the purpose of a dataset of non privacy-conscious users in Figure 2) and 1 fold as a test set (whose users we treat as privacy-conscious). We use the training set to (a) compute extended profiles for each item by performing matrix factorization, (b) empirically estimate the probabilities p +, p for each item, and (c) train multiple classifiers, to be used to infer the private features. We describe the details of our MF implementation and the classifiers we use below. We split the ratings of each user in the test set into two sets by randomly selecting 70% of the ratings as the first set, and the remaining 30% as the second set. We obfuscate the ratings in the first set using MP, MPSS, and several baselines as described in detail below. The obfuscated ratings are given as input to our classifiers to infer the user s private feature. We further estimate a user s extended profile using the LSE method described in Section 3.3, and use this profile (including both x and the inferred x 0) to predict her ratings on the second set. For each obfuscation scheme and classification method, we measure the privacy risk of the inference through these classifiers using the area under the curve (AUC) metric [18, 19]. Moreover, for each obfuscation scheme, we measure the prediction accuracy through the root mean square error (RMSE) of the predicted ratings. We cross-validate our results by computing the AUC and RMSE 10 times, each time with a different fold as a test set, and reporting average values. We note that the AUC ranges from 0.5 (perfect privacy) to 1 (no privacy). Matrix Factorization. We use 20 iterations of stochastic gradient descent [25] to perform MF on each training set. For each item, feature biases v j0 were computed as the half distance between the average item ratings per each private feature value. The remaining features v j were computed through matrix factorization. We computed optimal regularization parameters and the dimension d = 20 through an additional 10-fold cross validation. Privacy Risk Assessment. We apply several standard classification methods to infer the private feature from the training ratings, namely Multinomial Naïve Bayes [45], Logistic Regression (LR), non-linear Support Vector Machines (SVM) with a Radial Basis Function (RBF) kernel, as well as the LSE (6). The input to the LR, NB and SVM methods comprises the ratings of all items provided by the user as well as zeros for movies not rated, while LSE operates only on the ratings that the user provides. As SVM scales quadraticaly with the number of users, we could not execute it on our largest dataset (Flixster, c.f. Table 3). Obfuscation Schemes. When using MP, the obfuscated rating may not be an integer value, and may even be outside of the range of rating values which is expected by a recommender system. Therefore, we consider a variation of MP that rounds the rating value to an integer in the range [1, 5]. Given a non-integer obfuscated rating r, which is between two integers k = r and k + 1, we perform rounding by assigning the rating k with probability r k and the rating k +1 with probability 1 (r k), which on expectation gives the desired rating r. Ratings higher than 5 and those lower than 1 are truncated to 5 and 1, respectively. We refer to this process as rounding, and denote the obfuscation scheme

11 Dataset Private featureusersitemsratings All K PTV Gender (F:M) 2.7:1-2.7:1 Politics (R:D) 1:1.4-1:1.4 All 6K 3K 1M MovielensGender (F:M) 1:2.5-1:3 Age (Y:A) 1:1.3-1:1.6 All 26K M Flixster Gender (F:M) 1.7:1-1.5:1 Figure 3: Statistics of the datasets used for evaluation. The ratios represent the class skew Females:Males (F:M) for gender, Young:Adult(Y:A) for age and Republican:Democrats (R:D) for political affiliation RMSE AUC NO MP MPr IA FA SS Obfuscation methods (a) PTV - Gender Multinomial LSE SVM (RBF) Logistic MPSS MPSSr IASS FASS 1.30 RMSE AUC NO MP MPr IA FA SS Obfuscation methods (b) PTV - Politics Multinomial LSE SVM (RBF) Logistic MPSS MPSSr IASS FASS Figure 4: Privacy risk and prediction accuracy on PTV, obtained using four classifiers and obfuscation schemes (NO-No obfuscation, MP - Midpoint Protocol, r - Rounding, IA - Item Average, FA - Feature Average, SS - Sub-Sampling). The proposed protocol (MP) is robust to privacy attacks with hardly any loss in predictive power. RMSE AUC NO MP MPr IA FA SS Obfuscation methods (a) Movielens - Gender Multinomial LSE SVM (RBF) Logistic MPSS MPSSr IASS FASS RMSE AUC NO MP MPr IA FA SS Obfuscation methods (b) Movielens - Age Multinomial LSE SVM (RBF) Logistic MPSS MPSSr IASS FASS RMSE AUC NO MP MPr IA FA SS Obfuscation methods (c) Flixster - Gender Multinomial LSE Logistic MPSS MPSSr IASS FASS Figure 5: Privacy risk and prediction accuracy on Movielens and Flixster (sparse datasets), obtained using four classifiers and obfuscation schemes (NO-No obfuscation, MP - Midpoint Protocol, r - Rounding, IA - Item Average, FA - Feature Average, SS - Sub-Sampling). The proposed protocol (MPSS) is robust to privacy attacks without harming predictive power. as MPr for midpoint protocol with rounding and MPSSr for midpoint protocol with sub-sampling and rounding. We also consider two alternative methods for obfuscation. First, the item average (IA) scheme replaces a user s rating with the average rating of the item, computed from the training set. Second, the feature average (FA) scheme replaces the user s rating with the average rating provided by the feature classes (e.g., males and females), each with probability 0.5. Finally, we evaluate each of the above obfuscation schemes, i.e., MP, MPr, IA and FA, together with sub-sampling (SS). As a baseline, we also evaluated the privacy risk and the prediction accuracy when no obfuscation scheme is used (NO). 7.2 Experimental Results Dense Dataset. We begin by evaluating the obfuscation schemes on the dense PTV dataset using its two users features (gender and political affiliation), illustrated in Figures 4a and 4b, respectively. Each figure shows the privacy risk (AUC) computed using the 4 inference methods, and the prediction accuracy (RMSE) on applying different obfuscation schemes. Both figures clearly show that MP successfully mitigates the privacy risk (AUC is around 0.5) whereas the prediction accuracy is hardly impacted (2% increase in RMSE). This illustrates that MP attains excellent privacy in practice, and that our modeling assumptions are reasonable: there is little correlation to the private feature after the category bias is removed. Indeed, strong correlations not captured by (7) could manifest as failure to block inference after obfuscation, especially through the non-linear SVM classifier. This is clearly not the case, indicating that any dependence on the private feature not captured by (7) is quite weak. Adding rounding (MPr), which is essential for real-world deployment of MP, has very little effect on both the AUC and RMSE. Though IA and FA are successful in mitigating the privacy risk, they are suboptimal in terms of prediction. They severely impact the prediction accuracy, increasing the RMSE by roughly 9%. Finally, since this is a dense dataset, there is little correlation between the private feature and the

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### BlurMe: Inferring and Obfuscating User Gender Based on Ratings

BlurMe: Inferring and Obfuscating User Gender Based on Ratings Udi Weinsberg, Smriti Bhagat, Stratis Ioannidis, Nina Taft {udi.weinsberg, smriti.bhagat, startis.ioannidis, nina.taft}@technicolor.com Technicolor

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### On the Effectiveness of Obfuscation Techniques in Online Social Networks

On the Effectiveness of Obfuscation Techniques in Online Social Networks Terence Chen 1,2, Roksana Boreli 1,2, Mohamed-Ali Kaafar 1,3, and Arik Friedman 1,2 1 NICTA, Australia 2 UNSW, Australia 3 INRIA,

### Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

### Introduction to Online Learning Theory

Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

### Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Lecture 3: Linear Programming Relaxations and Rounding

Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Globally Optimal Crowdsourcing Quality Management

Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford

### 1. R In this and the next section we are going to study the properties of sequences of real numbers.

+a 1. R In this and the next section we are going to study the properties of sequences of real numbers. Definition 1.1. (Sequence) A sequence is a function with domain N. Example 1.2. A sequence of real

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### 11 Ideals. 11.1 Revisiting Z

11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

### Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

### 24. The Branch and Bound Method

24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

### Offline sorting buffers on Line

Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

### Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Section 1.1. Introduction to R n

The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

### ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

### Date: April 12, 2001. Contents

2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### ! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

### MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

### FIRST YEAR CALCULUS. Chapter 7 CONTINUITY. It is a parabola, and we can draw this parabola without lifting our pencil from the paper.

FIRST YEAR CALCULUS WWLCHENW L c WWWL W L Chen, 1982, 2008. 2006. This chapter originates from material used by the author at Imperial College, University of London, between 1981 and 1990. It It is is

### max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

### Statistical Models in Data Mining

Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

### Monotonicity Hints. Abstract

Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### Efficient Algorithms for Masking and Finding Quasi-Identifiers

Efficient Algorithms for Masking and Finding Quasi-Identifiers Rajeev Motwani Stanford University rajeev@cs.stanford.edu Ying Xu Stanford University xuying@cs.stanford.edu ABSTRACT A quasi-identifier refers

### FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z DANIEL BIRMAJER, JUAN B GIL, AND MICHAEL WEINER Abstract We consider polynomials with integer coefficients and discuss their factorization

### Introduction to time series analysis

Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

### Axiom A.1. Lines, planes and space are sets of points. Space contains all points.

73 Appendix A.1 Basic Notions We take the terms point, line, plane, and space as undefined. We also use the concept of a set and a subset, belongs to or is an element of a set. In a formal axiomatic approach

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

### 6 Gaussian Elimination

G1BINM Introduction to Numerical Methods 6 1 6 Gaussian Elimination 61 Simultaneous linear equations Consider the system of linear equations a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 +

### Theorem 5. The composition of any two symmetries in a point is a translation. More precisely, S B S A = T 2

Isometries. Congruence mappings as isometries. The notion of isometry is a general notion commonly accepted in mathematics. It means a mapping which preserves distances. The word metric is a synonym to

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

### A Practical Application of Differential Privacy to Personalized Online Advertising

A Practical Application of Differential Privacy to Personalized Online Advertising Yehuda Lindell Eran Omri Department of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il,omrier@gmail.com

### WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly

### MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao yufeiz@mit.edu

Integer Polynomials June 9, 007 Yufei Zhao yufeiz@mit.edu We will use Z[x] to denote the ring of polynomials with integer coefficients. We begin by summarizing some of the common approaches used in dealing

### Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

### Summary of week 8 (Lectures 22, 23 and 24)

WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Chapter 21: The Discounted Utility Model

Chapter 21: The Discounted Utility Model 21.1: Introduction This is an important chapter in that it introduces, and explores the implications of, an empirically relevant utility function representing intertemporal

### Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

### 1. Prove that the empty set is a subset of every set.

1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since

### Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques My T. Thai 1 Overview An overview of LP relaxation and rounding method is as follows: 1. Formulate an optimization

### SOLUTIONS TO ASSIGNMENT 1 MATH 576

SOLUTIONS TO ASSIGNMENT 1 MATH 576 SOLUTIONS BY OLIVIER MARTIN 13 #5. Let T be the topology generated by A on X. We want to show T = J B J where B is the set of all topologies J on X with A J. This amounts

### CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### By W.E. Diewert. July, Linear programming problems are important for a number of reasons:

APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center