Budget-optimal Crowdsourcing using Low-rank Matrix Approximations

Budget-optial Crowdsourcing using Low-rank Matrix Approxiations David R. Karger, Sewoong Oh, and Devavrat Shah Departent of EECS, Massachusetts Institute of Technology Eail: {karger, swoh, devavrat}@it.edu Abstract Crowdsourcing systes, in which nuerous tasks are electronically distributed to nuerous inforation pieceworkers, have eerged as an effective paradig for huanpowered solving of large scale probles in doains such as iage classification, data entry, optical character recognition, recoendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers ust devise schees to increase confidence in their answers, typically by assigning each task ultiple ties and cobining the answers in soe way such as ajority voting. In this paper, we consider a odel of such crowdsourcing tasks and pose the proble of iniizing the total price (i.e., nuber of task assignents that ust be paid to achieve a target overall reliability. We give a new algorith for deciding which tasks to assign to which workers and for inferring correct answers fro the workers answers. We show that our algorith, based on low-rank atrix approxiation, significantly outperfors ajority voting and, in fact, is order-optial through coparison to an oracle that knows the reliability of every worker. I. INTRODUCTION Background. Crowdsourcing systes have eerged as an effective paradig for huan-powered proble solving and are now in widespread use for large-scale data-processing tasks such as iage classification, video annotation, for data entry, optical character recognition, translation, recoendation, and proofreading. Crowdsourcing systes such as Aazon Mechanical Turk 1 establish a arket where a taskaster can subit batches of sall tasks to be copleted for a sall fee by any worker choosing to pick the up. For exaple a worker ay be able to earn a few cents by indicating which iages fro a set of 30 are suitable for children (one of the benefits of crowdsourcing is its applicability to such highly subjective questions. Because the tasks are tedious and the pay is low, errors are coon even aong workers who ake an effort. At the extree, soe workers are spaers, subitting arbitrary answers independent of the question in order to collect their fee. Thus, all crowdsourcers need strategies to ensure the reliability of answers. Because the worker crowd is large, anonyous, and transient, it is generally difficult to build up a trust relationship with particular workers. 2 It is also difficult to condition payent on correct answers, as the correct answer ay never truly be known and delaying 1 http://www.turk.co 2 For certain high-value tasks, crowdsourcers can use entrance exas to prequalify workers and block spaers, but this increases the cost of the task and still provides no guarantee that the workers will try hard after qualification. payent can annoy workers and ake it harder to recruit the to your task. Instead, ost crowdsourcers resort to redundancy, giving each task to ultiple workers, paying the all irrespective of their answers, and aggregating the results by soe ethod such as ajority voting. For such systes there is a natural core optiization proble to be solved. Assuing the taskaster wishes to achieve a certain reliability in their answers, how can she do so at iniu cost (which is equivalent to asking how she can do so while asking the fewest possible questions? Several characteristics of crowdsourcing systes ake this proble interesting. Workers are neither persistent nor identifiable; each batch of tasks will be solved by a worker who ay be copletely new and who you ay never see again. Thus one cannot identify and reuse particularly reliable workers. Nonetheless, by coparing one worker s answer to others on the sae question, it is possible to draw conclusions about a worker s reliability, which can be used to weight their answers to other questions in their batch. However, batches ust be of anageable size, obeying liits on the nuber of tasks that can be given to a single worker. Another interesting aspect of this proble is the choice of task assignents. Unlike any inference probles which akes inferences based on a fixed set of signals, we can choose which signals to easure by deciding which questions to ask to which workers. This akes designing a crowdsourcing syste challenging in that we are required to deterine how to allocate the tasks as well as design an algorith to infer the correct answers once the workers subit their answers. In the reainder of this introduction, we will define a foral odel that captures these aspects of the proble. Then, we will describe how to allocate tasks and infer the correct answers. We subsequently describe how these two procedures can be integrated into a single Budget-optial Crowdsourcing algorith. We show that this algorith is order-optial: for a given target error rate, it spends only a constant factor ties the iniu necessary to achieve that error rate. Setup. We odel a set of tasks {t i } i [] as each being associated with an unobserved correct solution s i {±1}. Here and after, we use [N] to denote the set of first N integers. In the iage categorization exaple stated earlier, tasks corresponds to labeling iages as suitable for children (+1 or not ( 1. These tasks are assigned to n workers fro the crowd. We use {w j } j [n] to denote this

set of n workers. When a task is assigned to a worker, we get a possibly inaccurate answer fro the worker. We use A ij {±1} to denote the answer if task t i is assigned to worker w j. Soe workers are diligent whereas other workers ight be spaers. We choose a siple odel to capture the presence of spaers, which we call the spaer-haer odel. Under this odel, we assue that each worker is either a haer or a spaer. A haer always gives the correct answer to all the questions and a spaer always gives rando answers. Each worker w j is labeled by a reliability paraeter p j {1/2, 1}, such that p j = 1 if w j is a haer and p j = 1/2 if w j is a spaer. According to the spaer-haer odel, if task t i is assigned to worker w j then { si with probability p A ij = j, (1 s i with probability 1 p j, and A ij = 0 if t i is not assigned to w j. The rando variable A ij is independent of any other event given p j. (Throughout this paper, we use boldface characters to denote rando variables and rando atrices unless it is clear fro the context. We further assue that each worker s reliability is independent and identically distributed, such that each worker is a haer with probability q and spaer with probability 1 q: p j = { 1 with probability q, 1/2 with probability 1 q. It is quite realistic to assue the existence of such a prior distribution for p j s. In particular, it is et if we siply randoize the order in which we upload our task batches, since this will have the effect of randoizing which workers perfor which batches, yielding a distribution that eets our requireents. On the other hand, it is not realistic to assue that we know what the prior is. To execute our inference algorith, we do not require the knowledge of the haer probability q. On the other hand, q is necessary in deciding how any ties a task should be replicated to achieve certain reliability, and we discuss a siple way to overcoe this liitation in Section II-D. Under this crowdsourcing odel, a taskaster first decides which tasks should be assigned to which workers, and then infer the correct solutions {s i } i [] once all the answers {A ij } are subitted. We assue a one-shot odel in which all questions are asked siultaneously and then an estiation is perfored after all the answers are obtained. In particular, we do not allow allocating tasks adaptively based on the answers received thus far. Then, assigning tasks to nodes aounts to designing a bipartite graph G({t i } i [] {w j } j [n], E with task nodes and n worker nodes. Each edge (i, j E indicates that task t i was assigned to worker w j. With a slight abuse of notations, we use a atrix A {0, 1, 1} n to denote the randoly weighted adjacency atrix of the graph G: edge (i, j E is weighted with the subitted answer A ij {+1, 1} and A ij = 0 if (i, j / E. We shall use C 1, C 2, etc. to denote general constants. Whenever we say a property A holds with high probability (w.h.p, we ean that there exists a function f(, n such that P(A 1 f(, n and li,n f(, n = 0. Prior Work. A naive approach to aggregate inforation fro ultiple workers is to use ajority voting. Majority voting siply follows what the ajority of workers agree on. When we have any spaers in the crowd, ajority voting is error-prone since it gives the sae weight to all the answers, regardless of whether they are fro a spaer or a diligent workers. We will show in Section II-C that ajority voting is provably sub-optial and can be significantly iproved upon. To fully exploit redundancy, we need to infer the reliability of the workers siultaneously while inferring the solutions of the tasks. Dawid and Skene [DS79] proposed an iterative algorith for inferring the solutions and reliability of workers, based on expectation axiization (EM [DLR77]. EM is a heuristic inference algorith that iteratively does the following: given workers answers to the tasks, the algorith attepts to estiate the reliability of the workers and given estiation of reliability (error probabilities of workers, it estiates the solution of the tasks; and repeat. Due to particular siplicity of the EM algorith, it has been widely applied in classification probles where the training data is annotated by low-cost noisy labelers [JG03], [RYZ + 10]. In [WRW + 09] and [WBBP10], this EM approach has been applied to ore coplicated probabilistic odels for iage labeling tasks. However, the perforance of these approaches are only epirically evaluated, and there is no analysis that proves perforance guarantees. In particular, EM algoriths require an initial starting point which is typically randoly guessed. The algorith is highly sensitive to this initialization, aking it difficult to predict the quality of the resulting estiate. Contributions. In this work, we provide a rigorous treatent of designing a crowdsourcing syste with the ai of iniizing the budget to achieve copletion of task with a certain reliability. We provide both an optial graph construction (rando regular bipartite graph and an optial algorith for inference (low-rank approxiation on that graph. As the ain result, we show that our algorith perfors as good as the best possible algorith. The surprise lies in the fact that the optiality of our algorith is established by coparing it with the best algorith, one that is free to choose any graph, regular or irregular, and perfors optial estiation based on the inforation provided by an oracle about reliability of workers. In the process of establishing these results, we obtain a generalization of a celebrated result of Friedan, Kahn and Szeerédi [FKS89] on the spectru of sparse rando graphs. Previous approaches focus on developing inference algoriths assuing that a graph is already given. None of the prior work on crowdsourcing provides any systeatic treatent of the graph construction. We are the first to

study both aspects of crowdsourcing together and, ore iportantly, establish optiality. II. MAIN RESULTS In the crowdsourcing odel introduced, we are interested in designing algoriths for two related probles: (i how should the tasks be assigned to workers, i.e. the selection of the bipartite graph G; and (ii given the responses fro the workers, how should one estiate the correct answers. In what follows, we first address these two probles. For (i, we propose to utilize rando regular bipartite graphs and for (ii, we propose to utilize certain low-rank atrix approxiation based estiation procedure. We subsequently describe how we can integrate these two procedures into a single Budget-optial Crowdsourcing syste to achieve optial perforance. A. Graph Generation Assigning tasks to workers aounts to designing a bipartite graph G. Given tasks to coplete, the taskaster first akes a choice of the left degree l (how any workers to assign to each task and the right degree r (how any tasks to assign to each worker. The nuber of required workers n is then deterined such that the total nuber of edges is consistent, that is l = nr. To generate an (l, r-regular bipartite graph we use a rando graph generation schee known as the configuration odel in rando graph literature [RU08], [Bol01]. In principle, one could use arbitrary bipartite graph G for task allocation. However, as we shall show in Section II-C, rando regular graphs are sufficient to achieve order-optial perforance. B. Inference Algorith Given G, let A = [A ij ] {0, 1, 1} n denote the answers provided by the workers. In this section, we shall introduce a low-rank approxiation based algorith that takes A as input and produces an estiate for the unobserved solution vector s = [s i ] { 1, 1}. We then provide the perforance guarantee of this algorith. Surprisingly, as we will show in a later section, this siple inference algorith can be used as a subroutine to achieve optial perforance. Low-rank Approxiation Input: A. Output: Estiation ŝ(a. 1: Copute a pair of left and right singular vectors (u, v of A corresponding to the top singular value; 2: If j:v j 0 v2 j < 1/2, then output ŝ(a = sign( u; 3: Otherwise, output ŝ(a = sign(u; In any case, the leading singular vector of A is not uniquely deterined. Both (u, v and ( u, v are valid pairs of left and right singular vectors. To resolve this issue, our strategy is to choose the pair (u, v if v has ore ass on the positive orthant than v, that is j:v j 0 v2 j j:v j<0 v2 j. Let (u, v be the pair of singular vectors after we resolve the abiguity in the sign. The j-th entry of v represents our belief on how reliable worker j is, and our estiate is a weighted su of the subitted answers weighted by workers reliabilities: ( = sign A ij v j, (2 ŝ i j i where i [n] denotes the set of workers that are assigned task i. This follows fro the fact that u i = j i A ijv j. In Section III-A, we give in detail the intuition behind why the top left singular vector of A reveals the structure of the underlying unobserved answers. With this estiate, we can show the following bound on the average nuber of error defined as the noralized Haing distance: d(s, ŝ 1 I(s i ŝ i, (3 i=1 where I( denotes the indicator function. The proof of this result is given in Section III-A. Theore II.1. For fixed l and r which are independent of, assue that tasks are assigned to n = l/r workers under the spaer-haer odel according to a rando (l, r-regular graph drawn fro the configuration odel. Then, for any s {±1}, with probability 1 Ω( l, the low-rank approxiation algorith achieves d(s, ŝ(a C(ρ lq, (4 where q is the probability that a randoly chosen worker is a haer and C(ρ is a constant that only depends on ρ l/r. Rearks about Theore II.1. Now few rearks are in order. First, observe that this result is non-asyptotic and provides a concrete bound for any. Second, the constant C(ρ depends continuously on ρ and is uniforly bounded over any closed interval [α, β] for 0 < α β < (with bound dependent on α, β. To achieve optial perforance, we shall use the algorith with l = r = Θ(1/q. Therefore, ρ = 1 and hence the constant C(ρ = C(1 can be treated as a universal constant independent of other proble paraeters. Third, the choice of l (and r depend on q to achieve average error less than 1/2 (specifically, l ust scale as 1/q. Such an instance of algorith (with l = r = Θ(1/q will be used to design optial crowdsourcing syste. Thus, our syste design requires (approxiate knowledge of q to achieve optial perforance to decide on the nuber of task replication, l. But beyond that, neither the graph selection nor the inference algorith require the knowledge of q. A siple procedure is suggested that overcoes even this need of knowing q in Section II-D. Run-tie of Low-rank Approxiation. We next show that O(l log(/ log(lq operations are sufficient to ensure that the perforance guarantee in Theore II.1 is achieved.

This is coparable to the siple ajority voting which requires Ω(l operations. As an iplication of Lea III.3, which is established as part of the proof of Theore II.1, we obtain that when the bound (4 is non-trivial (i.e. lq > C(ρ, there is a strict separation between the first singular value of A and the rest of the singular values. Precisely, σ 2 (A/σ 1 (A < C 1 (ρ/ lq with high probability, where σ i (A is the i-th largest singular value of A. This iplies that the leading singular vector pair (u, v is unique up to a sign. We will be interested in the regie where A is sparse, that is l, r = Θ(1 and ρ = Θ(1. In this regie, the leading singular value and singular vectors of the atrix A can be coputed efficiently, for instance, using power iteration [Ber92]. Each iteration requires O(l operations. When the second singular value is less by a factor C 1 (ρ/(lq 1/2 < 1 copared to the first one, power iteration converges exponentially at rate deterined by this factor. Therefore, for ρ = Θ(1 (in our optial algorith, ρ = 1, the Low-rank approxiation takes O(log(/ log(lq iterations to ensure the error bound entioned. We suarize this in for the following Lea which is proved in Section III-A.2. Lea II.2. Under the hypothesis of Theore II.1, the total coputational cost required to achieve the bound in Theore II.1 is O(l log(/ log(lq. C. Optiality As a taskaster, the natural core optiization proble of our concern is how to achieve a certain reliability in our answers with iniu cost. Since we pay equal aount for all the task assignents, the cost is proportional to the total nuber of edges of the graph G. Here we copute the total budget sufficient to achieve a target error rate and show that this is within a constant factor fro the necessary budget to achieve the given target error rate using any possible graph and any possible inference algorith. The order-optiality is established with respect to all algoriths that operate in oneshot, i.e. all task assignents are done siultaneously, then an estiation is perfored after all the answers are obtained. Forally, consider a scenario where there are tasks to coplete and a target accuracy ɛ (0, 1/2. To easure accuracy, we use the average probability of error per task. We will show that Ω ( (1/q log(1/ɛ assignents per task is necessary to achieve the target error rate: 1 P(s i ŝ i ɛ. i [] To design a syste that can achieve the target error rate using a nuber of assignents close to this fundaental liit, we utilize the vanilla version of the graph assignent and inference algorith described thus far to design an budget-optial crowdsourcing syste. The ain idea is to obtain T independent estiates using T independent graphs and T independent sets of workers. Then, we can aggregate these independent estiates to achieve optial perforance. Budget-optial Crowdsourcing Input: tasks, task degree l, nuber of groups T. Output: estiate vector ŝ. 1: For all t [T ] do Recruit workers; Generate an independent (l, l-regular graph G (t fro the configuration odel; Assign tasks to workers according to G (t ; Run the Low-rank Approxiation algorith; Set ŝ (t as the estiation vector thus obtained; 2: Output ŝ i = sign( t [T ] ŝ(t i. Notice that this algorith does not violate the one-shot scenario, since we can generate a single large graph G = t G (t in advance. In particular, no inforation fro one group is used in generating the graph for other group. We state the following optiality result (x y indicates that x scales as y, i.e. x = Θ(y. Theore II.3. Under the one-shot scenario, the following stateents are true. (a Let LB the iniu cost per task necessary to achieve a target accuracy ɛ (0, 1/2 using any graph and any possible algorith. Then LB 1 ( 1 q log. (5 ɛ (b Let Lowrank the iniu cost per task sufficient to achieve a target accuracy ɛ using the Budget-optial Crowdsourcing syste. Then there exists a universal constant M such that for all M, Lowrank 1 ( 1 q log. (6 ɛ (c Let Majority be the iniu cost per task necessary to achieve a target accuracy ɛ using the Majority voting schee on any graph. Then Majority 1 ( 1 q 2 log. (7 ɛ The above (cf. (5 & (6 establish the optiality of our algorith. It is indeed surprising that regular graphs are sufficient to achieve this optiality. Further, the scaling of Majority voting is (1/q 2 log(1/ɛ which significantly worse than the optial scaling of (1/q log(1/ɛ of our algorith. Finally, we ephasize that the low-rank approxiation algorith is quite efficient: it requires O ( (1/q log( log(1/ɛ operations per task to achieve the target accuracy ɛ. It takes O ( (1/q 2 log(1/ɛ operations for the siple ajority voting to achieve the sae reliability. D. Discussions It is worth pointing out soe liitations and variations of our results, and interesting research directions: Knowledge of q. We assued the knowledge of q in selecting the degree l = Θ(1/q in the design of the graph in the Budget-optial Crowdsourcing syste. Here is a siple

way to overcoe this liitation at the loss of only additional constant factor, i.e. scaling of cost per task still reains Θ(1/q log(1/ɛ. To that end, consider an increental design in which at iteration k the syste is designed assuing q = 2 k for k 1. At iteration k, we design two replicas of the syste as per the Budget-optial Crowdsourcing for q = 2 k. Now copare the estiates obtained by these two replicas for all tasks. If they agree aongst (1 2ɛ tasks, then we stop and declare that as the final answer. Or else, we increase k to k + 1 and repeat. Note that by our optiality result, it follows that if 2 k is less than the actual q then the iteration ust stop with high probability. Therefore, the total cost paid is Θ(1/q log(1/ɛ with high probability. Thus, even lack of knowledge of q does not affect the optiality of our algorith. More general odels. For siplicity we assued the spaer-haer odel, where p j {1/2, 1}. A natural generalization of this is to allow any p j [0, 1]. We expect that our algorith and analysis can be strengthened to prove a bound on the error rate in this ore general case. An underlying assuption in our odel is that the error probability of a worker does not depend on the particular task and all the tasks share an equal level of difficulty. Another underlying assuption in our odel is that the workers are unbiased: the error probability of a worker does not depend whether the correct answer is a +1 or a 1. There is a ore general odel investigated in [WRW + 09], which relaxes both of these assuptions. In forula, a worker j s response to a binary task i can be odeled as A ij = sign(z i,j, where Z i,j N (s i + α j, r i + β j. Here, s i {+1, 1} represents the correct binary answer of task i, r i represents the level of difficulty of task i, α j represents the bias of worker j, and β j represents the reliability of worker j. Most of the crowdsourcing odels introduced so far can be reduced to a special case of this odel. For exaple, the first crowdsourcing odel introduced by Dawid and Skene [DS79] is equivalent to the above Gaussian odel with r i = 0 for all i []. The odel we study in this paper is equivalent to the above Gaussian odel with α j = 0 for all j [n] and r i = 0 for all i []. It is desirable to characterize how our algorith works under this ore general odel. III. INTUITION AND PROOFS In this section, we present intuitions behind the low-rank approxiation algorith and provide a proof of Theore II.1. We then provide a proof of optiality of our algorith. A. Low-rank approxiation A low-rank odel is often used to capture the iportant aspects of datasets in atrix for. The data analysis technique using low-rank approxiations of data atrices is referred to as principal coponent analysis (PCA [Jol86]. PCA often reveals hidden structures in the data and has been successfully applied to applications including latent seantic indexing and spectral clustering. A rank-1 approxiation of our data atrix A can be easily coputed using singular value decoposition (SVD. Let the singular value decoposition of A be A = in{,n} i=1 u i σ i v T i, where u i R and v i R n are the i-th left and right singular vectors, and σ i R is the i-th singular value. Here and after, ( T denotes the transpose of a atrix or a vector. For siplicity, we use u = u 1 for the first left singular vector and v = v 1 for the first right singular vector. Singular values are typically assued to be sorted in a nonincreasing order satisfying σ 1 σ 2 0. Then, the optial rank-1 approxiation is given by a rank-1 projector P 1 ( : R n R n such that P 1 (A = σ 1 uv T, (8 It is a well known fact that P 1 (A iniizes the ean squared error. In forula, P 1 (A = arg in X:rank(X 1 (A ij X ij 2 In the proble of estiating the solutions of tasks via crowdsourcing, it turns out that PCA provides good estiates. Consider an ideal case where G is a coplete graph, and all the workers are haers and provide the correct answers. Hence, there is no randoness in this exaple. Then, A = s1 T n, where s {±1} is the vector of correct solutions and 1 n R n is the all ones vector. It is siple to see that A is a rank-1 atrix, since it is a ultiplication of two rank-1 vectors. Therefore, u is equal to the correct solution s up to a scaling and sign (both (1/ s and (1/ s are valid singular vectors. Now, consider a case where all the workers are haers but the graph is a (l, r-regular graph. In this case, it is also true that u is equal to the correct solution s up to a scaling. Here is a sketch of the proof of this clai. Start with an all ones task vector s = 1. Then, 1 is an eigenvector of AA T as AA T 1 = lr1. It follows fro the Perron- Frobenius theore [HJ90] that (1/ 1 is a left singular vector corresponding to the largest singular value of A. In other words, u = (1/ 1. Now, if s is not all ones vector, then s = S1, where S is a diagonal atrix with S ii = s i. Note that S is also a orthogonal atrix such that SS T = S T S = I, where I is the diensional identity atrix. Recall that for any orthogonal atrix Q, Qu is the first left singular vector of QA. Then, since we know that S is an orthogonal atrix and the first left singular vector of SA is (1/ 1, it follows that the first singular vector of A is u = (1/ S T 1 = (1/ s. This proves the clai that u is equal to s up to a scaling. One iportant iplication of the above arguent is that the accuracy of this estiate does not depend on a particular choice of s {±1}. i,j

In the presence of spaers, the proble is ore coplicated since soe workers can ake errors. We get a rando atrix A where the randoness coes fro the construction of the graph G and the subitted answers on the edges of the graph. On expectation, a low-rank approxiation of the rando atrix A provides a good estiate. Recall that p = [p j ] {1/2, 1} n. Since E[A ij ] = (l/ns i (2p j 1, the expectation E[A] = (l/ns(2p 1 n T is a rank-1 atrix whose first left singular vector is equal to s up to a scaling. We can consider the rando atrix A as a perturbation of a rank-1 atrix: A = E[A]+Z, where the perturbation Z will be sall in an appropriate sense if the nuber of spaers are sall and the degree l is large. Building on this intuition, next we ake this stateent rigorous and provide a proof of the perforance guarantee for the Low-rank Approxiation algorith. 1 Proof of Theore II.1: Let u = (u 1,..., u T be a left singular vector corresponding to the leading singular value of A. Ideally, we want to track each entry u i for ost realizations of the rando atrix A, which is difficult. Instead, we track the Euclidean distance between two vectors: 1 s u. The following lea upper bounds the Haing distance by the Euclidean distance. Lea III.1. For any s {±1} and the Haing distance d( defined in (3, d(s, sign(u 1 s u 2. (9 Proof. The above lea follows fro a series of inequities: 1 I(s i sign(u i 1 I(s i u i 0 i i 1 (s i u i 2. To upper bound the Euclidean distance, we apply the next proposition to two rank-1 atrices: P 1 (A and E[A p]. For the proof of this proposition, we refer to the proof of a ore general stateent for general low-rank atrices in [KMO10, Reark 6.3]. Proposition III.2. For two rank-1 atrices with singular value decoposition M = xσy T and M = x σ (y T, we have in{ x+x, x x } ( 2/σ M M F, where x = i x2 i denotes the Euclidean nor and X F = i,j (X ij 2 denotes the Frobenius nor. By syetry this bound also holds with σ in the denoinator. For a given (rando quality vector p = (p 1,..., p n T and any solution vector s {±1}, E[A p] = (l/s(2p 1 n T is a rank-1 atrix. Here, 1 n denotes the n-diensional all-ones vector. This atrix has a left singular vector (1/ s and a singular value σ = lr/n 2p 1 n. Recall that p j = 1 with probability q and 1/2 with probability 1 q, and p j s are independent of one another. Since σ 2 is a su of i.i.d. bounded rando variables, we can applying i Hoeffding s inequality to show that σ lrq/2 with probability 1 e Ω(nq2. P 1 (A is a rank-1 projection defined in (8. Notice that we have two choices for the left singular vector. Both u and u are valid singular vectors of P 1 (A and we do not know a priori which one is closer to (1/ s. For now, let us assue that u is the one closer to the correct solution, such that (1/ s u (1/ s+u. Later in this section, we will explain how we can identify u with high probability of success. Applying Proposition III.2 to P 1 (A and E[A p], we get that, with high probability, 1 s u 2 E[A p] P1 (A lrq F. (10 For any atrix X of rank-2, X F 2 X 2, where X 2 ax x, y 1 x T Xy denotes the operator nor. Therefore, by triangular inequity, E[A p] P1 (A F 2 E[A p] P 1 (A 2 2 E[A p] A 2 + 2 A P1 (A 2 2 2 E[A p] A 2, (11 where in the last inequity we used the fact that P 1 (A is the iniizer of A X 2 aong all atrices X of rank one, whence A P 1 (A 2 A E[A p] 2. The following key technical lea provides a bound on the operator nor of the difference between rando atrix A and its (conditional expectation. This lea generalizes a celebrated bound on the second largest eigenvalue of d-regular rando graphs by Friedan-Kahn-Szeerédi [FKS89], [FO05], [KMO10]. The proofs of this lea is skipped here due to space liitations. Lea III.3. Assue that an (l, r-regular rando bipartite graph G with left nodes and n = Θ( right nodes is generated according to the configuration odel. A is the weighted adjacency atrix of G with rando weight A ij assigned to each edge (i, j E. With probability 1 Ω( l, A E[A] 2 C (ρa ax (lr 1/4, (12 where A ij A ax alost surely and C (ρ is a constant that only depends on ρ /n. Under our odel, A ax = 1 since A ij {±1}. We then apply this lea to each realization of p and substitute this bound in (11. Together with (10 and (9, this finishes the proof Theore II.1. Now, we are left to prove that between u and u we can choose the one closer to (1/ s. Given a rank- 1 atrix P 1 (A, there are two possible pairs of left and right noralized singular vectors: (u, v and ( u, v. Let P + ( : R n R n denote the projection onto the positive orthant such that P + (v i = I(v i 0v i. Our strategy is to choose u to be our estiate if P + (v 2 1/2 (and

u otherwise. We clai that with high probability the pair (u, v chosen according to our strategy satisfies (1/ s u (1/ s + u. (13 Assue that the pair (u, v is the one satisfying the above inequality. Denote the singular vectors of E[A p] by x = (1/ s and y = (1/ 2p 1 n (2p 1 n, and singular value σ = E[A p] 2. Let σ = P 1 (A 2. Then, by Proposition III.2 and the triangular inequality, y v = 1 σ E[A p]t x 1 σ P 1(A T u 1 σ E[A p]t (x u + 1 σ (E[A p] P 1(A T u ( 1 + σ 1 σ P 1 (A T u C 1 (lrq 2 1/4. The first ter is upper bounded by (1/σ E[A p] T (x u x u, which is again upper bounded by C 2 /(lrq 2 1/4 using (10. The second ter is upper bounded by (1/σ (E[A p] P 1 (A T u (1/σ E[A p] P 1 (A 2, which is again upper bounded by C 3 /(lrq 2 1/4 using (12 and σ (1/2 lrq. The third ter is up- ( per bounded by 1 σ 1 σ P 1 (A T u σ σ /σ, which is again upper bounded by C 4 /(lrq 2 1/4 using triangular inequality: (1/σ E[A p] 2 P 1 (A 2 (1/σ E[A p] P 1 (A 2. This iplies that P + (v y y P + (v 1 y v 1 C 1 /(lrq 2 1/4, where the second inequality follows fro the fact that p j 1/2 which iplies that all entries of y are non-negative. Notice that we can increase the constant C(ρ in the bound (4 of the ain theore such that we only need to restrict our attention to (lrq 2 1/4 > 4C 1. This proves that the pair (u, v chosen according to our strategy satisfy (13, which is all we need in order to prove Theore II.1. 2 Proof of Lea II.2: Power iteration is a siple procedure for coputing the singular vector of a atrix A corresponding to the largest singular value. Power iteration is especially efficient when A sparse, since it iterates by applying the atrix twice at each iteration. We denote the i-th largest singular value by σ i (A, such that σ 1 (A σ 2 (A... 0. Let u be the left singular vector corresponding to the largest singular value. Then, our estiate of u after k iterations is x (k = AA T x (k 1, x (k = 1 x (k x(k, with a rando initialization x (0. The convergence of the sequence x (k to u is not sensitive to a particular initialization, and we can initialize each entry of x (0, for exaple, as i.i.d. Gaussian rando variable. Let P u (x = (u T xu denote the projection onto the subspace spanned by u, and P u (x = x (u T xu be the projection onto the copleent subspace. Then, by singular value decoposition of A, it follows that P u ( x (k = (σ 1 (A 2 P u (x (k 1 and P u ( x (k (σ 2 (A 2 P u (x (k 1. Then, P u (x (k P u (x (k ( σ2 (A 2 Pu (x (k 1 σ 1 (A P u (x (k 1. (14 Since P u (x (k is the error in our estiate, this iplies that the error in our estiation of the vector u decays exponentially given that σ 2 (A is strictly saller than σ 1 (A. Under the crowdsourcing odel, we can even show a stronger result than just a strict inequality. Naely, σ 2 (A σ 1 (A C 1, (15 (lrq 2 1/4 with high probability. Fro the proof of Theore II.1, recall that E[A] is a rank-1 atrix with E[A] 2 (1/2 lrq with high probability. By Lea III.3, we know that A E[A] 2 C 2 (lr 1/4. Now applying Weyl s inequality [HJ90, Theore 4.3.1], it iediately follows fro that σ 1 (A (1/2 lrq C 2 (lr 1/4 C 3 lrq and σ2 (A C 2 (lr 1/4. Here we used the fact that in the regie where Theore II.1 is non-trivial, we can assue that lrq 2 C(ρ. Weyl s inequality applied to non-syetric atrices states that for two atrices M and M of the sae diensions, σ k (M σ 1 (M σ k (M +M σ k (M+σ 1 (M. Substituting (15 in (14, we get that P u (x (k ( C 2k 1 2, (16 P u (x (k (lrq 2 1/4 with high probability. Here we used the concentration of easure result on Gaussian rando variables: ( P u (x (0 /( P u (x (0 2 with high probability. Now, for the error bound in Theore II.1 to hold after a finite k iterations, all we need is for our estiate x (k to satisfy, x (k (1/ s s C 5 /(lrq 2 1/4. By triangular inequality, x (k (1/ s s (1/ s s u + x (k u. We already proved in Section III-A.1 that the first ter is bounded with appropriate bound. For the second ter, notice that x (k u 2 = 2 P u (x (k. By (16, it follows that k = O(log(/ log(lrq 2 is sufficient to guarantee that the error bound in Theore II.1 holds with high probability. B. Optiality under One-Shot: Proof of Theore II.3 In order to copute the fundaental liit on costaccuracy tradeoff, we need a lower bound on the achievable accuracy using any possible schee. Let us consider an oracle estiator that akes the optial decision based on inforation provided by an oracle. Further, assue that the oracle gives inforation on which workers are haers and which are spaers. This estiator only akes istakes on

tasks whose neighbors are all spaers, in which case the estiator flips a fair coin to ake a decision. If we denote the degree of node i by l i, the error rate P(ŝ i s i is (1/2(1 q li. Note that no algorith with only inforation on {A ij } can produce ore accurate estiate than the oracle estiator. Therefore, for any estiate ŝ that is a function of {A ij } (i,j E, we have the following iniax bound. inf sup P(s ŝ 1 1 (1 qli ŝ s {±1} 2 i [] 1 2 (1 q E /, where the second inequality follows by convexity and E = i l i is the total nuber of edges. This iplies that in order to achieve a target accuracy ɛ using any possible algorith, it is necessary to have E (/q log(1/ɛ. This gives LB (1/q log(1/ɛ. Let us ephasize that the necessary condition on the budget in (5 is copletely general and holds for any graph, regular or irregular. Next, consider the proposed Budget-optial Crowdsourcing algorith. To achieve the optial perforance, we run T independent estiation procedure to get T independent estiates of each s i. Let ŝ (t i denote the estiate of s i produces by the t-th group of workers for t [T ]. Set l 1/q with large enough constant such that (1/ i [] I(s i ŝ t i 1/8 with probability 1 Ω( l by Theore II.1. By syetry of the graph selection procedure, the estiate accuracy is invariant of a particular choice of task i. Then, P(s i ŝ (t i 1/8 + Ω( l. Then for e Ω(1/ l, P(s i ŝ (t i 1/4. Note that e Ω(1/ l = O(1. Further, {ŝ (t i } t [T ] are independent given s i. ( T Our final estiate after T runs is ŝ i = sign t=1 ŝ(t i. By concentration of easure result, the error rate is upper bounded by P(ŝ i s i e (1/8T. To achieve accuracy ɛ we need to have T log(1/ɛ. The iniu cost per task sufficient to achieve a target accuracy ɛ scales as lt, which gives Lowrank (1/q log ( 1/ɛ. Now, consider a naive ajority voting algorith. Majority voting siply follows what the ajority of workers agree on. In forula, ŝ i = sign( j i A ij, where i denotes the neighborhood of node i in the graph. It akes a rando choice when there is a tie. When we have any spaers in the crowd, ajority voting is prone to ake istakes since it gives the sae weight to both the estiates provided by spaers and those of haers. This liitation is captured in the following lower bound. Lea III.4. There exists a nuerical constant C 2 such that the error rate achieved using ajority voting schee is lower bounded by P(ŝ i s i e C2(liq2 +1, where l i is the degree of node i. The proof of this Lea is skipped due to space liitations. Now fro convexity it follows that 1 P(s i ŝ i e C2((1/ E q2 +1. i [] Then, with ajority voting schee, the iniu cost per task necessary to achieve a target accuracy ɛ is Majority (1/q 2 log ( 1/ɛ. REFERENCES [Ber92] M. W. Berry, Large scale sparse singular value coputations, International Journal of Supercoputer Applications 6 (1992, 13 49. [Bol01] B. Bollobás, Rando Graphs, Cabridge University Press, January 2001. [DLR77] A. P. Depster, N. M. Laird, and D. B. Rubin, Maxiu likelihood fro incoplete data via the e algorith, Journal of the Royal Statistical Society. Series B (Methodological 39 (1977, no. 1, pp. 1 38. [DS79] A. P. Dawid and A. M. Skene, Maxiu likelihood estiation of observer error-rates using the e algorith, Journal of the Royal Statistical Society. Series C (Applied Statistics 28 (1979, no. 1, 20 28. [FKS89] J. Friedan, J. Kahn, and E. Szeerédi, On the second eigenvalue in rando regular graphs, Proceedings of the Twenty- First Annual ACM Syposiu on Theory of Coputing (Seattle, Washington, USA, ACM, ay 1989, pp. 587 598. [FO05] U. Feige and E. Ofek, Spectral techniques applied to sparse rando graphs, Rando Struct. Algoriths 27 (2005, no. 2, 251 275. [HJ90] R. A. Horn and C. R. Johnson, Matrix analysis, Cabridge University Press, 1990. [JG03] R. Jin and Z. Ghahraani, Learning with ultiple labels, Advances in neural inforation processing systes (2003, 921 928. [Jol86] I. T. Jolliffe, Principal coponent analysis, Springer-Verlag, 1986. [KMO10] R. H. Keshavan, A. Montanari, and S. Oh, Matrix copletion fro a few entries, IEEE Trans. Infor. Theory 56 (2010, no. 6, 2980 2998. [RU08] T. Richardson and R. Urbanke, Modern Coding Theory, Cabridge University Press, arch 2008. [RYZ + 10] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy, Learning fro crowds, J. Mach. Learn. Res. 99 (2010, 1297 1322. [WBBP10] P. Welinder, S. Branson, S. Belongie, and P. Perona, The Multidiensional Wisdo of Crowds, 2424 2432. [WRW + 09] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsa, and J. Movellan, Whose vote should count ore: Optial integration of labels fro labelers of unknown expertise, Advances in Neural Inforation Processing Systes 22 (2009, 2035 2043.