Listwise Approach to Learning to Rank - Theory and Algorithm
|
|
|
- Jonathan Davidson
- 10 years ago
- Views:
Transcription
1 Fen Xia* Institute of Automation, Chinese Academy of Sciences, Beijin, , P. R. China. Tie-Yan Liu Microsoft Research Asia, Sima Center, No.49 Zhichun Road, Haidian District, Beijin, , P. R. China. Jue Wan Wenshen Zhan Institute of Automation, Chinese Academy of Sciences, Beijin, , P. R. China. Han Li Microsoft Research Asia, Sima Center, No.49 Zhichun Road, Haidian District, Beijin, , P. R. China. Abstract This paper aims to conduct a study on the listwise approach to learnin to rank. The listwise approach learns a rankin function by takin individual lists as instances and minimizin a loss function defined on the predicted list and the round-truth list. Existin work on the approach mainly focused on the development of new alorithms; methods such as RankCosine and ListNet have been proposed and ood performances by them have been observed. Unfortunately, the underlyin theory was not sufficiently studied so far. To amend the problem, this paper proposes conductin theoretical analysis of learnin to rank alorithms throuh investiations on the properties of the loss functions, includin consistency, soundness, continuity, differentiability, convexity, and efficiency. A sufficient condition on consistency for rankin is iven, which seems to be the first such result obtained in related research. The paper then conducts analysis on three loss functions: likelihood loss, cosine loss, and cross entropy loss. The latter two were used in RankCosine and ListNet. The use of the likelihood loss leads to the development of Appearin in Proceedins of the 5 th International Conference on Machine Learnin, Helsinki, Finland, 008. Copyriht 008 by the author(s)/owner(s). *The work was performed when the first authors was an intern at Microsoft Research Asia. a new listwise method called ListMLE, whose loss function offers better properties, and also leads to better experimental results. 1. Introduction Rankin, which is to sort objects based on certain factors, is the central problem of applications such as information retrieval (IR) and information filterin. Recently machine learnin technoloies called learnin to rank have been successfully applied to rankin, and several approaches have been proposed, includin the pointwise, pairwise, and listwise approaches. The listwise approach addresses the rankin problem in the followin way. In learnin, it takes ranked lists of objects (e.., ranked lists of documents in IR) as instances and trains a rankin function throuh the minimization of a listwise loss function defined on the predicted list and the round truth list. The listwise approach captures the rankin problems, particularly those in IR in a conceptually more natural way than previous work. Several methods such as RankCosine and ListNet have been proposed. Previous experiments demonstrate that the listwise approach usually performs better than the other approaches (Cao et al., 007)(Qin et al., 007). Existin work on the listwise approach mainly focused on the development of new alorithms, such as RankCosine and ListNet. However, there was no sufficient theoretical foundation laid down. Furthermore, the strenth and limitation of the alorithms, and the relations between the proposed alorithms were still
2 not clear. This larely prevented us from deeply understandin the approach, more critically, from devisin more advanced alorithms. In this paper, we aim to conduct an investiation on the listwise approach. First, we ive a formal definition of the listwise approach. In rankin, the input is a set of objects, the output is a permutation of the objects 1, and the model is a rankin function which maps a iven input to an output. In learnin, the trainin data is drawn i.i.d. accordin to an unknown but fixed joint probability distribution between input and output. Ideally we would minimize the expected 0 1 loss defined on the predicted list and the round truth list. Practically we instead manae to minimize an empirical surroate loss with respect to the trainin data. Second, we evaluate a surroate loss function from four aspects: (a) consistency, (b) soundness, (c) mathematical properties of continuity, differentiability, and convexity, and (d) computational efficiency in learnin. We ive analysis on three loss functions: likelihood loss, cosine loss, and cross entropy loss. The first one is newly proposed in this paper, and the last two were used in RankCosine and ListNet, respectively. Third, we propose a novel method for the listwise approach, which we call ListMLE. ListMLE formalizes learnin to rank as a problem of minimizin the likelihood loss function, equivalently maximizin the likelihood function of a probability model. Due to the nice properties of the loss function, ListMLE stands to be more effective than RankCosine and ListNet. Finally, we have experimentally verified the correctness of the theoretical findins. We have also found that ListMLE can sinificantly outperform RankCosine and ListNet. The rest of the paper is oranized as follows. Section introduces related work. Section 3 ives a formal definition to the listwise approach. Section 4 conducts theoretical analysis of listwise loss functions. Section 5 introduces the ListMLE method. Experimental results are reported in Section 6 and the conclusion and future work are iven in the last section.. Related Work Existin methods for learnin to rank fall into three cateories. The pointwise approach (Nallapati, 004) transforms rankin into reression or classification on 1 In this paper, we use permutation and ranked list interchaneably. sinle objects. The pairwise approach (Herbrich et al., 1999) (Freund et al., 1998) (Bures et al., 005) transforms rankin into classification on object pairs. The advantae for these two approaches is that existin theories and alorithms on reression or classification can be directly applied, but the problem is that they do not model the rankin problem in a straihtforward fashion. The listwise approach can overcome the drawback of the aforementioned two approaches by tacklin the rankin problem directly, as explained below. For instance, Cao et al. (007) proposed one of the first listwise methods, called ListNet. In ListNet, the listwise loss function is defined as cross entropy between two parameterized probability distributions of permutations; one is obtained from the predicted result and the other is from the round truth. Qin et al. (007) proposed another method called RankCosine. In the method, the listwise loss function is defined on the basis of cosine similarity between two score vectors from the predicted result and the round truth. Experimental results show that the listwise approach usually outperforms the pointwise and pariwise approaches. In this paper, we aim to investiate the listwise approach to learnin to rank, particularly from the viewpoint of loss functions. Actually similar investiations have also been conducted for classification. For instance, in classification, consistency and soundness of loss functions are well studied. Consistency forms the basis for the success of a loss function. It is known that if a loss function is consistent, then the learned classifier can achieve the optimal Bayes error rate in the lare sample limit. Many well known loss functions such as hine loss, exponential loss, and loistic loss are all consistent (cf., (Zhan, 004)(Bartlett et al., 003)(Lin, 00)). Soundness of a loss function uarantees that the loss can represent well the tareted learnin problem. That is, an incorrect prediction should receive a larer penalty than a correct prediction, and the penalty should reflect the confidence of prediction. For example, hine loss, exponential loss, and loistic loss are sound for classification. In contrast, square loss is sound for reression but not for classification (Hastie et al., 001). 3. Listwise Approach We ive a formal definition of the listwise approach to learnin to rank. Let X be the input space whose In a broad sense, methods directly optimizin evaluation measures, such as SVM-MAP (Yue et al., 007) and AdaRank (Xu & Li, 007) can also be rearded as listwise alorithms. We will, however, limit our discussions in this paper on alorithms like ListNet and RankCosine.
3 elements are sets of objects to be ranked, Y be the output space whose elements are permutations of objects, and P XY be an unknown but fixed joint probability distribution of X and Y. Let h : X Y be a rankin function, and H be the correspondin function space (i.e., h H). Let x X and y Y, and let y(i) be the index of object which is ranked at position i. The task is to learn a rankin function that can minimize the expected loss R(h), defined as: R(h) = l(h(x), y)dp (x, y), (1) X Y where l(h(x), y) is the 0 1 loss function such that l(h(x), y) = { 1, if h(x) y 0, if h(x) = y, () That is to say, we formalize the rankin problem as a new classification problem on permutations. If the permutation of the predicted result is the same as the round truth, then we have zero loss; otherwise we have one loss. In real rankin applications, the loss can be cost-sensitive, i.e., dependin on the positions of the incorrectly ranked objects. We will leave this as our future work and focus on the 0 1 loss in this paper first. Actually, in the literature of classification, people also studied the 0 1 loss first, before they eventually moved onto the cost-sensitive case. It is easy to see that the optimal rankin function which can minimize the expected loss R(h B ) = inf R(h) is iven by the Bayes rule, h B (x) = ar max P (y x), (3) y Y Since P XY is unknown, formula (1) cannot be directly solved and thus h B (x) cannot be easily obtained. In practice, we are iven independently and identically distributed (i.i.d) samples S = {(x (i), y (i) )} m P XY, we instead try to obtain a rankin function h H that minimizes the empirical loss. R S (h) = 1 m m l(h(x (i) ), y (i) ). (4) Note that for efficiency consideration, in practice the rankin function usually works on individual objects. It assins a score to each object (by employin a scorin function ), sorts the objects in descendin order of the scores, and finally creates the ranked list. That is to say, h(x (i) ) is decomposable with respect to objects. It is defined as h(x (i) ) = sort((x (i) 1 ),..., (x(i) n i )). (5) where x (i) j x (i), n i denotes the number of objects in x (i), ( ) denotes the scorin function, and sort( ) denotes the sortin function. As a result, (4) becomes: R S () = 1 m m l(sort((x (i) 1 ),..., (x(i) n i )), y (i) ). (6) Due to the nature of the sortin function and the 0 1 loss function, the empirical loss in (6) is inherently non-differentiable with respect to, which poses a challene to the optimization of it. To tackle this problem, we can introduce a surroate loss as an approximation of (6), followin a common practice in machine learnin. ((x (i) 1 R φ S () = 1 m m φ((x (i) ), y (i) ), (7) where φ is a surroate loss function and (x (i) ) = ),..., (x(i) n i )). For convenience in notation, in the followin sections, we sometimes write φ y () for φ((x), y) and use bold symbols such as to denote vectors since for a iven x, (x) becomes a vector. 4. Theoretical Analysis 4.1. Properties of Loss Function We analyze the listwise approach from the viewpoint of surroate loss function. Specifically, we look at the followin properties 3 of it: (a) consistency, (b) soundness, (c) continuity, differentiability, and convexity, and (d) computational efficiency in learnin. Consistency is about whether the obtained rankin function can convere to the optimal one throuh the minimization of the empirical surroate loss (7), when the trainin sample size oes to infinity. It is a necessary condition for a surroate loss function to be a ood one for a learnin alorithm (cf., Zhan (004)). Soundness is about whether the loss function can indeed represent loss in rankin. For example, an incorrect rankin should receive a larer penalty than a correct rankin, and the penalty should reflect the confidence of the rankin. This property is particularly important when the size of trainin data is small, because it can directly affect the trainin results. 4.. Consistency We conduct analysis on learnin to rank alorithms from the viewpoint of consistency. As far as we know, 3 In addition, converence rate is another issue to consider. We leave it as future work.
4 this is the first work discussin the consistency issue for rankin. In the lare sample limit, minimizin the empirical surroate loss (7) amounts to minimizin the followin expected surroate loss R φ () = E X,Y {φ y ((x))} = E X {Q((x))} (8) where Q((x)) = y Y P (y x)φ y ((x)). Here we assume (x) is chosen from a vector Borel measurable function set, whose elements can take any value from Ω R n. When the minimization of (8) can lead to the minimization of the expected 0 1 loss (1), we say the surroate loss function is consistent. A equivalent definition can be found in Definition. Actually this equivalence relationship has been discussed in related work on the consistency of classification (Zhan, 004). Definition 1. We define Λ y as the space of all possible probabilities on the permutation space Y, i.e., Λ Y {p R Y : y Y p y = 1, p y 0}. Definition. The loss φ y () is consistent on a set Ω R n with respect to the rankin loss (1), if the followin conditions hold: p Λ Y, assume y = ar max y Y p y and Yy c denotes the space of permutations after removin y, we have inf Q() < inf Q() Ω Ω,sort() Y c y We next ive sufficient conditions of consistency in rankin. Definition 3. A permutation probability space Λ Y is order preservin with respect to object i and j, if the followin conditions hold: y Y i,j {y Y : y 1 (i) < y 1 (j)} where y 1 (i) denotes the position for object i in y, denote σ 1 y as the permutation which exchanes the positions of object i and j while hold others unchaned for y, we have p y > p σ 1 y. Definition 4. The loss φ y () is order sensitive on a set Ω R n, if φ y () is a non-neative differentiable function and the followin two conditions hold: 1. y Y, i < j, denote σy as the permutation which exchanes the object on position i and that on position j while holds others unchaned for y, if y(i) < y(j), then φ y () φ σy () and with at least one y, the strict inequality holds.. If i = j, then either y Y i,j, φ y() i or y Y i,j, φ y() i φ y() j one y, the strict inequality holds. φ y() j,, and with at least Theorem 5. Let φ y () be an order sensitive loss function on Ω R n. n objects, if its permutation probability space is order preservin with respect to n 1 objective pairs (j 1, j ), (j, j 3 ),, (j n 1, j n ). Then the loss φ y () is consistent with respect to (1). Due to space limitations, we only ive the proof sketch. First, we can show if the permutation probability space is order preservin with respect to n 1 objective pairs (j 1, j ), (j, j 3 ),, (j n 1, j n ), then the permutation with the maximum probability is y = (j 1, j,, j n ). Second, for an order sensitive loss function, for any order preservin object pairs (j 1, j ), the vector which minimizes Q() in (8) should assin a larer score to j 1 than to j. This can be proven by the chane of loss due to exchanin the scores of j 1 and j. Given all these results and Definition, we can prove Theorem 5 by means of contradiction. Theorem 5 ives sufficient conditions for a surroate loss function to be consistent: the permutation probability space should be order preservin and the function should be order sensitive. Actually, the assumption of order preservin has already been made when we use the scorin function and sortin function for rankin. The property of order preservin has also been explicitly or implicitly used in previous work, such as Cossock and Zhan (006). The property of order sensitive shows that startin with a round truth permutation, the loss will increase if we exchane the positions of two objects in it, and the speed of increase in loss is sensitive to the positions of objects Case Studies We look at the four properties of three loss functions Likelihood Loss We introduce a new loss function for listwise approach, which we call likelihood loss. The likelihood loss function is defined as: φ((x), y) = lo P (y x; ) (9) n exp((x y(i) )) where P (y x; ) = n k=i exp((x y(k))). Note that we actually define a parameterized exponential probability distribution over all the permutations iven the predicted result (by the rankin function), and define the loss function as the neative lo likelihood of the round truth list. The probability distribution turns out to be a Plackett-Luce model (Marden, 1995). The likelihood loss function has the nice properties as below.
5 First, the likelihood loss is consistent. The followin proposition shows that the likelihood loss is order sensitive. Therefore, accordin to Theorem 5, it is consistent. Due to the space limitations, we omit the proof. Proposition 6. The likelihood loss (9) is order sensitive on Ω R n. Second, the likelihood loss function is sound. For simplicity, suppose that there are two objects to be ranked (similar arument can be made when there are more objects). The two objects receive scores of 1 and from a rankin function. Fiure 1(a) shows the scores, and the point = ( 1, ). Suppose that the first object is ranked below the second object in the round truth. Then the upper left area above line = 1 corresponds to correct rankin; and the lower riht area incorrect rankin. Accordin to the definition of likelihood loss, all the points on the line = 1 +d has the same loss. Therefore, we say the likelihood loss only depends on d. Fiure 1(b) shows the relation between the loss function and d. We can see the loss function decreases monotonously as d increases. It penalizes neative values of d more heavily than positive ones. This will make the learnin alorithm focus more on avoidin incorrect rankins. In this reard, the loss function is a ood approximation of the 0 1 loss. d (a) = d 1 = 1 1 (b) φ d vector of the round truth and that of the predicted result. φ((x), y) = 1 (1 ψ y(x) T (x) ). (10) ψ y (x) (x) The score vector of the round truth is produced by a mappin ψ y ( ) : R d R, which retains the order in a permutation, i.e, ψ y (x y(1) ) > > ψ y (x y(n) ). First, we can prove that the cosine loss is consistent, iven the followin proposition. Due to space limitations, we omit the proof. Proposition 7. The cosine loss (10) is order sensitive on Ω R n. Second, the cosine loss is not very sound. Let us aain consider the case of rankin two objects. Fiure (a) shows point = ( 1, ) representin the scores of the predicted result and point ψ representin the round truth (which depends on the mappin function ψ). We denote the anle from point to line = 1 as α, and the anle from ψ to line = 1 as α ψ. We investiate the relation between the loss and the anle α. Fiure (b) shows the cosine loss as a function of α. From this fiure, we can see that the cosine loss is not a monotonously decreasin function of α. When α > α ψ, it increases quickly, which means that it can heavily penalize correct rankins. Furthermore, the mappin function and thus α ψ can also affect the loss function. Specifically, the curve of the loss function can shift from left to riht with different values of α ψ. Only when α ψ = π/, it becomes a relatively satisfactory representation of loss for the learnin problem. Fiure 1. (a) Rankin scores of predicted result; (b) Loss φ v.s. d for the likelihood loss. Third, it is easy to verify that the likelihood loss is continuous, differentiable, and convex (Boyd & Vandenberhe, 004). Furthermore, the loss can be computed efficiently, with time complexity of linear order to the number of objects. With the above ood properties, a learnin alorithm which optimizes the likelihood loss will become powerful for creatin a rankin function Cosine Loss The cosine loss is the loss function used in RankCosine (Qin et al., 007), a listwise method. It is defined on the basis of the cosine similarity between the score ψ = 1 φ α (a) 1 π α ψ π α (b) Fiure. (a) Rankin scores of predicted result and round truth; (b) Loss φ v.s. anle α for the cosine loss. Third, it is easy to see that the cosine loss is continuous, differentiable, but not convex. It can also be computed in an efficient manner with a time complexity linear to the number of objects.
6 Cross Entropy Loss The cross entropy loss is the loss function used in List- Net (Cao et al., 007), another listwise method. The cross entropy loss function is defined as: φ((x), y) = D(P (π x; ψ y ) P (π x; )) (11) n exp(ψ y (x π(i) )) where P (π x; ψ y ) = n k=i exp(ψ y(x π(k) )) P (π x; ) = n exp((x π(i) )) n k=i exp((x π(k))) where ψ is a mappin function whose definition is similar to that in RankCosine. First, we can prove that the cross entropy loss is consistent, iven the followin proposition. Due to space limitations, we omit the proof. Proposition 8. The cross entropy loss (11) is order sensitive on Ω R n. Second, the cross entropy loss is not very sound. Aain, we look at the case of rankin two objects. = ( 1, ) denotes the rankin scores of the predicted result. ψ denotes the rankin scores of the round truth (dependin on the mappin function). Similar to the discussions in the likelihood loss, the cross entropy loss only depends on the quantity d. Fiure 3(a) illustrates the relation between, ψ, and d. Fiure 3(b) shows the cross entropy loss as a function of d. As can be seen that the loss function achieves its minimum at point d ψ, and then increases as d increases. That means it can heavily penalize those correct rankins with hiher confidence. Note that the mappin function also affects the penalization. Accordin to mappin functions, the penalization on correct rankins can be even larer than that on incorrect rankins. ψ = d 1 = 1 φ Alorithm 1 ListMLE Alorithm Input: trainin data{(x (1), y (1) ),..., (x (m), y (m) )} Parameter: learnin rate η, tolerance rate ɛ Initialize parameter ω repeat for i = 1 to m do Input (x (i), y (i) ) to Neural Network and compute radient ω with current ω Update ω = ω η ω end for calculate likelihood loss on the trainin set until chane of likelihood loss is below ɛ Output: Neural Network model ω set of convex function is closed under addition (Boyd & Vandenberhe, 004). However, it cannot be computed in an efficient manner. The time complexity is of exponential order to the number of objects. Table 1 ives a summary of the properties of the loss functions. All the three loss functions as aforementioned are consistent, as well as continuous and differentiable. The likelihood loss is better than the cosine loss in terms of convexity and soundness, and is better than the cross entropy loss in terms of time complexity and soundness. 5. ListMLE We propose a novel listwise method referred to as ListMLE. In learnin of ListMLE, we employ the likelihood loss as the surroate loss function, since it is proven to have all the nice properties as a surroate loss. On the trainin data, we actually maximize the sum of the likelihood function with respect to all the trainin queries. m lo P (y (i) x (i) ; ). (1) d (a) 1 (b) d ψ d We choose Stochastic Gradient Descent (SGD) as the alorithm for conductin the minimization. As rankin model, we choose linear Neural Network (parameterized by ω). Alorithm 1 shows the learnin alorithm based on SGD. Fiure 3. (a) Rankin scores of predicted result and round truth; (b) Loss φ v.s. d for the cross entropy loss. Third, it is easy to see that the cross entropy loss is continuous and differentiable. It is also convex because the lo of a convex function is still convex, and the 6. Experimental Results We conducted two experiments to verify the correctness of the theoretical findins. One data set is synthetic data, and the other is the LETOR benchmark data for learnin to rank (Liu et al., 007).
7 Table 1. Comparison between different surroate losses. Loss Consistency Soundness Continuity Differentiability Convexity Complexity Likelihood O(n) Cosine O(n) Cross entropy O(n! n) 6.1. Experiment on Synthetic Data We conducted an experiment usin a synthetic data set. We created the data as follows. First, we randomly sample a point accordin to the uniform distribution on the square area [0, 1] [0, 1]. Then we assin to the point a score usin the followin rule, y = x x + ɛ where ɛ denotes a random variable normally distributed with mean of zero and standard deviation of In total, we enerate 15 points and their scores in this way, and create a permutation on the points based on their scores, which forms an instance of rankin. We repeat the process and make 100 trainin instances, 100 validation instances, and 100 testin instances. We applied RankCosine, List- Net 4, and ListMLE to the data. We tried different score mappin functions for RankCosine and ListNet, and used five most representative ones, i.e., lo(15 r), 15 r, 15 r, (15 r) and exp(15 r), where r denotes the positions of objects. We denote the mappin functions as lo, sqrt, l, q, and exp for simplicity. The experiments were repeated 0 times with different initial values of parameters in the Neural Network model. Table shows the means and standard deviations of the accuracies and Mean Averae Precision (MAP)(Baeza-Yates & Ribeiro-Neto, 1999) of the three alorithms. The accuracy measures the proportion of correctly ranked instances and MAP 5 is a commonly used measure in IR. As shown in the table, ListMLE achieves the best performance amon all the alorithms in terms of both accuracy and MAP, owin to ood properties of its loss function. The accuracies of RankCosine and ListNet vary accordin to the mappin functions. Especially, RankCosine achieves an accuracy of only when usin the mappin function exp while when usin the mappin function l. This result indicates that the performances of the cosine loss and the cross entropy loss depend on the mappin functions, while findin a suitable mappin function is not easy. Furthermore, RankCosine has a larer variance than ListMLE and ListNet. The likely explanation is that RankCosine s 4 The top-1 version of the cross entropy loss was employed as in the oriinal work (Cao et al., 007). 5 When calculatin MAP, we treated the top-1 items as relevant and the other as irrelevant. Table. The performance of three alorithms on the synthetic data set. Alorithm Accuracy MAP ListMLE 0.9 ± ± 0.00 ListNet-lo ± ± 0.00 ListNet-sqrt ± ± 0.00 ListNet-l ± ± ListNet-q ± ± 0.00 ListNet-exp 0.83 ± ± RankCosine-lo ± ± RankCosine-sqrt ± ± RankCosine-l ± ± 0.00 RankCosine-q 0.10 ± ± RankCosine-exp ± ± performance is sensitive to the initial values of parameters due to the non-convexity of its loss function. 6.. Experiment on OHSUMED Data We also conducted an experiment on OHSUMED, a benchmark data set for learnin to rank provided in LETOR. There are in total 106 queries, and 16,140 query-document pairs upon which relevance judments are made. The relevance judments are either definitely relevant, possibly relevant, or not relevant. The data was in the form of feature vector and relevance label. There are in total 5 features. We used the data split provided in LETOR to conduct five-fold cross validation experiments. In evaluation, besides MAP, we adopted another measures commonly used in IR: Normalized Discounted Cumulative Gain (NDCG) (Jarvelin & Kekanainen, 000). Note that here the round truth in the data is iven as partial rankin, while the methods need to use total rankin (permutation) in trainin. To bride the ap, for RankCosine and ListNet, we adopted the methods proposed in the papers (Cao et al., 007) (Qin et al., 007). For ListMLE we randomly selected one perfect permutation for each query from amon the possible perfect permutations based on the round truth. We applied RankCosine, ListNet, and ListMLE to the data. The results reported below are those averaed over five trials. As shown in Fiure 4, ListMLE achieves the best performance amon all the alorithms. Especially, on NDCG@1, it has more than
8 5-point ains over RankCosine which is at the second place. We also conducted the t-test on the improvements of ListMLE over the other two alorithms. The results show that the improvements are statistically sinificant for and (p-value < 0.05) MAP ListMLE ListNet RankCosine Fiure 4. Rankin performance on OHSUMED data. 7. Conclusion In this paper, we have investiated the theory and alorithms of the listwise approach to learnin to rank. We have pointed out that to understand the effectiveness of a learnin to rank alorithm, it is necessary to conduct theoretical analysis on its loss function. We propose investiatin a loss function from the viewpoints of (a) consistency, (b) soundness, (c) continuity, differentiability, convexity, and (d) efficiency. We have obtained some theoretical results on consistency of rankin. We have conducted analysis on the likelihood loss, cosine loss, and cross entropy loss. The result indicates that the likelihood loss has better properties than the other two losses. We have then developed a new learnin alorithm usin the likelihood loss, called ListMLE and demonstrated its effectiveness throuh experiments. There are several directions which we can further explore. (1) We want to conduct more theoretical analysis on the properties of loss functions, for example, weaker conditions for consistency and the rates of converence. () We plan to study the case where costsensitive loss function is used instead of the 0 1 loss function in definin the expected loss. (3) We plan to investiate other surroate loss functions with the tools we have developed in this paper. References Baeza-Yates, R., & Ribeiro-Neto, B. (Eds.). (1999). Modern information retrieval. Addison Wesley. Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (003). Convexity, classification, and risk bounds (Technical Report 638). Statistics Department, University of California, Berkeley. Boyd, S., & Vandenberhe, L. (Eds.). (004). Convex optimization. Cambride University. Bures, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (005). Learnin to rank usin radient descent. Proceedins of ICML 005 (pp ). Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F., & Li, H. (007). Learnin to rank: From pairwise approach to listwise approach. Proceedins of the 4th International Conference on Machine Learnin (pp ). Corvallis, OR. Cossock, D., & Zhan, T. (006). Subset rankin usin reression. COLT (pp ). Freund, Y., Iyer, R., Schapire, R. E., & Siner, Y. (1998). An efficient boostin alorithm for combinin preferences. Proceedins of ICML (pp ). Hastie, T., Tibshirani, R., & Friedman, J. H. (Eds.). (001). The elements of statistical learnin: Data minin, inference and prediction. Spriner. Herbrich, R., Graepel, T., & Obermayer, K. (1999). Support vector vector learnin for ordinal reression. Proceedins of ICANN (pp ). Jarvelin, K., & Kekanainen, J. (000). Ir evaluation methods for retrievin hihly relevant documents. Proceedins of SIGIR (pp ). Lin, Y. (00). Support vector machines and the bayes rule in classification. Data Minin and Knowlede Discovery, Liu, T. Y., Qin, T., Xu, J., Xion, W. Y., & Li, H. (007). Letor: Benchmark dataset for research on learnin to rank for information retrieval. Proceedins of SIGIR. Marden, J. I. (Ed.). (1995). Analyzin and modelin rank data. London: Chapman and Hall. Nallapati, R. (004). Discriminative models for information retrieval. Proceedins of SIGIR (pp ). Qin, T., Zhan, X.-D., Tsai, M.-F., Wan, D.-S., Liu, T.- Y., & Li, H. (007). Query-level loss functions for information retrieval. Information processin and manaement. Xu, J., & Li, H. (007). Adarank: a boostin alorithm for information retrieval. Proceedins of SIGIR (pp ). Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (007). A support vector method for optimization averae precision. Proceedins of SIGIR (pp ). Zhan, T. (004). Statistical analysis of some multicateory lare marin classification methods. Journal of Machine Learnin Research, 5,
Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks
The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy
How To Rank With Ancient.Org And A Random Forest
JMLR: Workshop and Conference Proceedings 14 (2011) 77 89 Yahoo! Learning to Rank Challenge Web-Search Ranking with Initialized Gradient Boosted Regression Trees Ananth Mohan Zheng Chen Kilian Weinberger
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
A KERNEL MAXIMUM UNCERTAINTY DISCRIMINANT ANALYSIS AND ITS APPLICATION TO FACE RECOGNITION
A KERNEL MAXIMUM UNCERTAINTY DISCRIMINANT ANALYSIS AND ITS APPLICATION TO FACE RECOGNITION Carlos Eduardo Thomaz Department of Electrical Enineerin, Centro Universitario da FEI, FEI, Sao Paulo, Brazil
A Survey on Privacy Preserving Decision Tree Classifier
A Survey on Classifier Tejaswini Pawar*, Prof. Snehal Kamalapur ** * Department of Computer Enineerin, K.K.W.I.E.E&R, Nashik, ** Department of Computer Enineerin, K.K.W.I.E.E&R, Nashik, ABSTRACT In recent
Large Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. [email protected] Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Impacts of Manufacturing Sector on Economic Growth in Ethiopia: A Kaldorian Approach
Available online at http://www.journaldynamics.or/jbems Journal of Business Economics and Manaement Sciences Vol. 1(1), pp.1-8, December 2014 Article ID: JBEMS14/011 2014 Journal Dynamics Oriinal Research
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
How To Write A Report In Xbarl
XBRL Generation, as easy as a database. Christelle BERNHARD - [email protected] Consultant & Deputy Product Manaer of ADDACTIS Pillar3 Copyriht 2015 ADDACTIS Worldwide. All Rihts Reserved
Efficient and Provably Secure Ciphers for Storage Device Block Level Encryption
Efficient and Provably Secure Ciphers for Storae Device Block evel Encryption Yulian Zhen SIS Department, UNC Charlotte [email protected] Yone Wan SIS Department, UNC Charlotte [email protected] ABSTACT Block
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
IN the last decade, vehicular ad-hoc networks (VANETs) Prefetching-Based Data Dissemination in Vehicular Cloud Systems
1 Prefetchin-Based Data Dissemination in Vehicular Cloud Systems Ryansoo Kim, Hyuk Lim, and Bhaskar Krishnamachari Abstract In the last decade, vehicular ad-hoc networks VANETs have been widely studied
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
A General Boosting Method and its Application to Learning Ranking Functions for Web Search
A General Boosting Method and its Application to Learning Ranking Functions for Web Search Zhaohui Zheng Hongyuan Zha Tong Zhang Olivier Chapelle Keke Chen Gordon Sun Yahoo! Inc. 701 First Avene Sunnyvale,
Convex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
Strategies for Reducing the Risk of ecommerce Fraud
October 2010 Strateies for Reducin the Risk of ecommerce Fraud Fraud techniques are constantly evolvin, and new data breaches are reported every day are your online security practices adaptin rapidly enouh?
HPC Scheduling & Job Prioritization WHITEPAPER
HPC Schedulin & Job Prioritization WHITEPAPER HPC Schedulin & Job Prioritization Table of contents 3 Introduction 3 The Schedulin Cycle 5 Monitorin The Schedulin Cycle 6 Conclusion 2 HPC Schedulin & Job
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
On the Influence of the Prediction Horizon in Dynamic Matrix Control
International Journal of Control Science and Enineerin 203, 3(): 22-30 DOI: 0.5923/j.control.203030.03 On the Influence of the Prediction Horizon in Dynamic Matrix Control Jose Manue l Lope z-gue de,*,
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
GIS-BASED REAL-TIME CORRECTION OF DEVIATION ON VEHICLE TRACK *
Proceedins of the 004 International Conference TP-038 on Dynamics, Instrumentation and Control, Nanin, China, Auust, 004 GIS-BASED REAL-TIME CORRECTION OF DEVIATION ON VEHICLE TRACK YUANLU BAO, YAN LIU
Adaptive Learning Resources Sequencing in Educational Hypermedia Systems
Karampiperis, P., & Sampson, D. (2005). Adaptive Learnin Resources Sequencin in Educational Hypermedia Systems. Educational Technoloy & Society, 8 (4), 28-47. Adaptive Learnin Resources Sequencin in Educational
Variety Gains from Trade Integration in Europe
Variety Gains from Trade Interation in Europe d Artis Kancs a,b,c,, Damiaan Persyn a,b,d a IPTS, European Commission DG Joint Research Centre, E-41092 Seville, Spain b LICOS, University of Leuven, B-3000,
ON AN APPROACH TO STATEMENT OF SQL QUESTIONS IN THE NATURAL LANGUAGE FOR CYBER2 KNOWLEDGE DEMONSTRATION AND ASSESSMENT SYSTEM
216 International Journal "Information Technoloies & Knowlede" Volume 9, Number 3, 2015 ON AN APPROACH TO STATEMENT OF SQ QUESTIONS IN THE NATURA ANGUAGE FOR CYBER2 KNOWEDGE DEMONSTRATION AND ASSESSMENT
Load Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: [email protected] Abstract Load
ACO and PSO Algorithms Applied to Gateway Placement Optimization in Wireless Mesh Networks
2012 International Conference on Networks and Information (ICNI 2012) IPCSIT vol. 57 (2012) (2012) IACSIT Press, Sinapore DOI: 10.7763/IPCSIT.2012.V57.02 ACO and PSO Alorithms Applied to Gateway Placement
1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
S i m p l i c i t y c o n t r o l P o w e r. Alcatel-Lucent OmniTouch Contact Center Premium Edition
Alcatel-Lucent OmniTouch Contact Center Premium Edition Enablin onoin visual trackin of contact center activities S i m p l i c i t y c o n t r o l P o w e r Can you UNDERSTAND, ANTICIPATE and RESPOND
From stochastic dominance to mean-risk models: Semideviations as risk measures 1
European Journal of Operational Research 116 (1999) 33±50 Theory and Methodoloy From stochastic dominance to mean-risk models: Semideviations as risk measures 1 Wøodzimierz Oryczak a, *, Andrzej Ruszczynski
1 Introduction. 2 Electric Circuits and Kirchoff s Laws. J.L. Kirtley Jr. 2.1 Conservation of Charge and KCL
Massachusetts Institute of Technoloy Department of Electrical Enineerin and Computer Science 6.061 Introduction to Power Systems Class Notes Chapter 6 Manetic Circuit Analo to Electric Circuits J.L. Kirtley
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Revenue Optimization with Relevance Constraint in Sponsored Search
Revenue Optimization with Relevance Constraint in Sponsored Search Yunzhang Zhu Gang Wang Junli Yang Dakan Wang Jun Yan Zheng Chen Microsoft Resarch Asia, Beijing, China Department of Fundamental Science,
A Privacy Mechanism for Mobile-based Urban Traffic Monitoring
A Privacy Mechanism for Mobile-based Urban Traffic Monitorin Chi Wan, Hua Liu, Bhaskar Krishnamachari, Murali Annavaram Tsinhua University, Beijin, China [email protected] University of Southern California,
Implementing a Resume Database with Online Learning to Rank
Implementing a Resume Database with Online Learning to Rank Emil Ahlqvist August 26, 2015 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Jan-Erik Moström Supervisor at Knowit Norrland:
Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics
STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of
Penalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
Monotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: [email protected] Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
Total Portfolio Performance Attribution Methodology
Total ortfolio erformance Attribution Methodoloy Morninstar Methodoloy aper May 31, 2013 2013 Morninstar, Inc. All rihts reserved. The information in this document is the property of Morninstar, Inc. Reproduction
SPSS WebApp Framework
SPSS WebApp Framework Solve business problems with customized, Web-based, analytical applications powered by SPSS Solve business problems with SPSS WebApp applications Predictin the best course of action
Business Models and Software Companies. Internationalization of SW companies The network approach. Network theory. Introduction
Business Models and Software Companies Business models and forein market entry strateies Arto Ojala Project Manaer / Researcher Software Business Proram Research interest: Market entry and entry mode choice
Technical Report Documentation Page 2. Government Accession No. 3. Recipient s Catalog No.
1. Report No. Technical Report Documentation Pae 2. Government Accession No. 3. Recipient s Catalo No. FHWA/TX-03/4386-1 4. Title and Subtitle DEVELOPMENT OF A TOOL FOR EXPEDITING HIGHWAY CONSTRUCTION
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
SIMPLIFICATION OF WATER SUPPLY NETWORK MODELS THROUGH LINEARISATION
SIMPLIFICATIO OF WATER SUPPLY ETWORK MODELS THROUGH LIEARISATIO Tobias Maschler and Draan A. Savic [email protected] Report umber:99/ 999 Centre for Water Systems, University of Exeter, orth Park Road,
On Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: [email protected] Fabrizio Smeraldi School of
FilterBoost: Regression and Classification on Large Datasets
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley Machine Learning Department Carnegie Mellon University Pittsburgh, PA 523 [email protected] Robert E. Schapire Department
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Introduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
A Support Vector Method for Multivariate Performance Measures
Thorsten Joachims Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA [email protected] Abstract This paper presents a Support Vector Method for optimizing multivariate
A HyFlex Module for the Personnel Scheduling Problem
A HyFle Module for the Personnel Schedulin Problem Tim Curtois, Gabriela Ochoa, Matthew Hyde, José Antonio Vázquez-Rodríuez Automated Schedulin, Optimisation and Plannin (ASAP) Group, School of Computer
Identifying Market Price Levels using Differential Evolution
Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand [email protected] WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
AP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Logistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
Learning to Rank By Aggregating Expert Preferences
Learning to Rank By Aggregating Expert Preferences Maksims N. Volkovs University of Toronto 40 St. George Street Toronto, ON M5S 2E4 [email protected] Hugo Larochelle Université de Sherbrooke 2500
How To Write A Mathematical Model Of Ship Hydrodynamics In An Inland Shiphandlin Simulator Of Smu Insim
Scientific Journals Maritime University of Szczecin Zeszyty Naukowe Akademia Morska w Szczecinie 14, 37(19) pp. 1 15 14, 37(19) s. 1 15 ISSN 1733-867 Ship manoeuvrin hydrodynamics in a new inland shiphandlin
DUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 [email protected] Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
The equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/
Introduction. and set F = F ib(f) and G = F ib(g). If the total fiber T of the square is connected, then T >
A NEW ELLULAR VERSION OF LAKERS-MASSEY WOJIEH HAHÓLSKI AND JÉRÔME SHERER Abstract. onsider a push-out diaram of spaces A, construct the homotopy pushout, and then the homotopy pull-back of the diaram one
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Learning to Rank for Information Retrieval
SIGIR 28 Workshop Learning to Rank for Information Retrieval held in conjunction with the 31th Annual International ACM SIGIR Conference 24 July 28, Singapore Organizers Hang Li Microsoft Research Asia
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Principles of Assessment and Reporting in NSW Public Schools
NSW DEPARTMENT OF EDUCATION & TRAINING Auust 2008 nt sme Lea rn portin Re Principles Assessment and Reportin in NSW Public Schools in Ass es to p a re nts Preamble Backround Underpinnin the development
Learning to Rank with Nonsmooth Cost Functions
Learning to Rank with Nonsmooth Cost Functions Christopher J.C. Burges Microsoft Research One Microsoft Way Redmond, WA 98052, USA [email protected] Robert Ragno Microsoft Research One Microsoft Way
Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
BPM for Information Technology Transforming IT through Business Process Management
white paper BPM for Information Technoloy Transformin IT throuh Business Process Manaement This paper is intended for CIOs, IT Executives and Senior IT Manaers interested in understandin more about the
International Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 8, August 213 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Global Ranking
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
An Organisational Perspective on Collaborative Business Processes
An Oranisational Perspective on Collaborative Business Processes Xiaohui Zhao, Chenfei Liu, and Yun Yan CICEC - Centre for Internet Computin and E-Commerce Faculty of Information and Communication Technoloies
About the inverse football pool problem for 9 games 1
Seventh International Workshop on Optimal Codes and Related Topics September 6-1, 013, Albena, Bulgaria pp. 15-133 About the inverse football pool problem for 9 games 1 Emil Kolev Tsonka Baicheva Institute
A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
One-Way Analysis of Variance
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
Globally Optimal Crowdsourcing Quality Management
Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University [email protected] Aditya G. Parameswaran University of Illinois (UIUC) [email protected] Jennifer Widom Stanford
arxiv:1112.0829v1 [math.pr] 5 Dec 2011
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly
Save thousands with the complete e-document distribution solution
Albany econnect Save thousands with the complete e-document distribution solution The UK s leadin provider of electronic document distribution software Since 1989, Albany Software has been developin software
AEROSOL STATISTICS LOGNORMAL DISTRIBUTIONS AND dn/dlogd p
Concentration (particle/cm3) [e5] Concentration (particle/cm3) [e5] AEROSOL STATISTICS LOGNORMAL DISTRIBUTIONS AND dn/dlod p APPLICATION NOTE PR-001 Lonormal Distributions Standard statistics based on
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 [email protected] Abstract
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Cost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
How To Calculate Max Likelihood For A Software Reliability Model
International Journal of Reliability, Quality and Safety Engineering c World Scientific Publishing Company MAXIMUM LIKELIHOOD ESTIMATES FOR THE HYPERGEOMETRIC SOFTWARE RELIABILITY MODEL FRANK PADBERG Fakultät
