CDAS: A Crowdsourcing Data Analytics System

Transcription

1 CDAS: A Crowdsourcig Data Aalytics Syste Xua Liu,MeiyuLu, Beg Chi Ooi, Yaya She,SaiWu, Meihui Zhag School of Coputig, Natioal Uiversity of Sigapore, Sigapore College of Coputer Sciece, Zhejiag Uiversity, Hagzhou, P.R. Chia {liuxua, lueiyu, ooibc, sheyaya, hzhag}@cop.us.edu.sg, wusai@zju.edu.c ABSTRACT Soe coplex probles, such as iage taggig ad atural laguage processig, are very challegig for coputers, where eve state-of-the-art techology is yet able to provide satisfactory accuracy. Therefore, rather tha relyig solely o developig ew ad better algoriths to hadle such tass, we loo to the crowdsourcig solutio eployig hua participatio to ae good the shortfall i curret techology. Crowdsourcig is a good suppleet to ay coputer tass. A coplex job ay be divided ito coputer-orieted tass ad hua-orieted tass, which are the assiged to achies ad huas respectively. To leverage the power of crowdsourcig, we desig ad ipleet a Crowdsourcig Data Aalytics Syste, CDAS. CDAS is a fraewor desiged to support the deployet of various crowdsourcig applicatios. The core part of CDAS is a quality-sesitive aswerig odel, which guides the crowdsourcig egie to process ad oitor the hua tass. I this paper, we itroduce the priciples of our quality-sesitive odel. To satisfy user required accuracy, the odel guides the crowdsourcig query egie for the desig ad processig of the correspodig crowdsourcig jobs. It provides a estiated accuracy for each geerated result based o the hua worers historical perforaces. Whe verifyig the quality of the result, the odel eploys a olie strategy to reduce waitig tie. To show the effectiveess of the odel, we ipleet ad deploy two aalytics jobs o CDAS, a twitter setiet aalytics job ad a iage taggig job. We use real Twitter ad Flicr data as our queries respectively. We copare our approaches with state-of-the-art classificatio ad iage aotatio techiques. The results show that the hua-assisted ethods ca ideed achieve a uch higher accuracy. By ebeddig the qualitysesitive odel ito crowdsourcig query egie, we effectively reduce the processig cost while aitaiig the required query aswer quality.. INTRODUCTION Crowdsourcig is widely adopted i Web.0 sites. For exaple, Wiipedia beefits fro thousads of subscribers, who cotiually write ad edit articles for the site. Aother exaple is Figure : Crowdsourcig Applicatio Yahoo! Aswers, where users subit ad aswer questios. I Web.0 sites, ost of the cotets are created by idividual users, ot service providers. Crowdsourcig is the drivig force of these web sites. To facilitate the developet of crowdsourcig applicatios, Aazo provides the Mechaical Tur (AMT) platfor. Coputer prograers ca exploit AMT s API to publish jobs for hua worers, who are good at soe coplex jobs, such as iage taggig ad atural laguage processig. The collective itelligece helps solve ay coputatioally difficult tass, thereby iprovig the quality of output ad users experiece. Figure illustrates the idea of usig crowdsourcig techiques to divide up jobs. CrowdDB [6], HuaGS [9] ad CrowdSearch [3] are recet exaples of applicatios o Aazo s AMT crowdsourcig platfor. Crowdsourcig relies o hua worers to coplete a job, but huas are proe to errors, which ca ae the results of crowdsourcig arbitrarily bad. The reaso is two-fold. First, to obtai rewards, a alicious worer ca subit rado aswers to all questios. This ca sigificatly degrade the quality of the results. Secod, for a coplex job, the worer ay lac the required owledge for hadlig it. As a result, a icorrect aswer ay be provided. To address the above probles, i AMT, a job is split ito ay HITs (Hua Itelligece Tass) ad each HIT is assiged to ultiple worers so that replicated aswers are obtaied. If coflictig aswers are observed, the syste will copare the aswers of differet worers ad deterie the correct oe. For exaple, i the CrowdDB [6], the votig strategy is adopted. The replicatio strategy, however, does ot fully solve the aswer diversity proble. Suppose we wat the precisio of our iage tags to be 95% ad the cost of worer per HIT is $0.0. If we assig each HIT to too ay worers, we will have to pay a high cost. O the other had, if few worers provide tags, we will ot have eough clue to ifer the correct tags. Give a expected accuracy, we therefore eed a adaptive query egie that guaratees high accuracy with high probability ad icurs as little cost as possible.

2 I this paper, we propose a quality-sesitive aswerig odel for the crowdsourcig systes, which is desiged to sigificatly iprove the quality of query results ad effectively reduce the processig cost at the sae tie. This odel is the core of our proposed Crowdsourcig Data Aalytics Syste (CDAS). CDAS exploits the crowd itelligece to iprove the perforace of differet data aalytics jobs, such as iage taggig ad setiet aalysis. CDAS trasfors the aalytics jobs ito hua jobs ad coputer jobs, which are the processed by differet odules. The hua jobs are hadled by the crowdsourcig egie, which adopts a two-phase processig strategy. The quality-sesitive aswerig odel is correspodigly split ito two sub-odels, a predictio odel ad a verificatio odel. The sub-odels are applied to differet phases, respectively. I the first phase, the egie eploys the predictio odel to estiate how ay worers are required to achieve a specific accuracy. The odel geerates its estiatio by collectig the distributio of all worers historical perforaces. Based o the odel s result, the egie creates ad subits the HIT to the crowdsourcig platfor. I the secod phase, the egie obtais the aswers fro the hua worers ad refies the as differet worers ay retur differet results for the sae questio. To verify the aswers fro differet hua worers, the votig strategy is used i CrowdDB to select the correct oe. I the siplest case, each HIT is set to worers ( is odd). A result is assued to be correct ad accepted, if o less tha worers retur it. The votig strategy is siple, but is ot very effective i the crowdsourcig sceario. Suppose we have a set of product reviews ad wat to ow the opiio of each review. We set the score to either positive, egative or eutral. If 30% of the worers vote positive, 30% of the worers vote egative ad the reaiig worers vote eutral, the votig strategy caot decide which aswer is ore trustable. Moreover, eve if ore tha 50% of the worers vote egative, we caot accept the aswer directly soe alicious worers ay collude to produce a false aswer. To iprove the accuracy of the crowdsourcig results, CDAS adopts a probabilistic approach. First, a verificatio odel is eployed to replace the votig strategy. It relies o worers past perforaces (i.e., the worers accuracies for historical queries) ad cobies vote distributio ad worers perforaces. Ituitively, the syste is ore liely to accept the aswers provided by the worer with a good accuracy. A rado saplig approach is desiged to estiate the worers accuracies i each job. By applyig the probability-based verificatio odel, we ca sigificatly iprove the result quality. Secod, istead of waitig for all the results, the adaptive query egie provides a approxiate result with cofidece ad refies it gradually as ore aswers are retured. This techique has bee desiged based o our observatio that i AMT, worers fiish their jobs asychroously. Therefore, it is iportat to offer the optio of a approxiate aswer that is gradually iproved as ore results are available, istead of lettig the user wait for the copletio of the query. This strategy is siilar to the traditioal olie query processig i philosophy ad serves to iprove users experiece. To evaluate our odel ad the perforace of CDAS, we ipleet two practical crowdsourcig jobs, a twitter setiet aalytics (TSA) job ad a iage taggig (IT) job. I TSA job, we subit a set of ovie titles as our queries ad try to fid the opiios of Twitter users. I IT job, we use the iages of Flicr as the queries ad as the hua worers to choose the correct tags. We will show the effectiveess of our crowdsourcig egie based o the quality-sesitive aswerig odel i the experietal sectio. Figure : CDAS Architecture The reaider of the paper is orgaized as follows. I Sectio, we preset the architecture of CDAS, ad itroduce the applicatios ipleeted over CDAS. I Sectio 3, we itroduce our predictio odel for estiatig a proper uber of worers for each job. To iprove the result accuracy, a probability-based verificatio odel is proposed i Sectio 4, which ca be exteded to support olie processig. We evaluate the perforace of our odels i CDAS i Sectio 5, ad discuss soe related wor i Sectio 6. We coclude the paper i Sectio 7.. OVERVIEW I this sectio, we itroduce the architecture of our Crowdsourcig Data Aalytics Syste, CDAS, ad discuss how to ipleet applicatios o top of CDAS.. Architecture of CDAS CDAS is the syste that exploits the crowdsourcig techiques to iprove the perforace of data aalytics jobs. The core differece betwee CDAS ad the covetioal aalytics systes lies i the processig echais. CDAS eploys hua worers to assist the aalytics tass, while other systes rely solely o coputer systes to aswer the queries. Figure shows the architecture of CDAS. CDAS cosists of three ajor copoets: job aager, crowdsourcig egie ad progra executor. The job aager accepts the subitted aalytics jobs ad trasfors the ito a processig pla, which describes how the other two copoets (crowdsourcig egie ad progra executor) should collaborate for the job. I particular, the job aager partitios the job ito two parts, oe for the coputers ad oe for the hua worers. For exaple, i hua-assisted iage search, the hua worers are resposible for providig the tags for each iage, while the iage classificatio ad idex costructio are hadled by the coputer progras. I ost cases, the two parts iteract with each other durig processig. The progra executor suarizes the results of crowdsourcig egie, ad the egie ay chage its job schedule due to the requests of progra executor. The crowdsourcig egie processes hua jobs i two phases.. I the first phase, the egie geerates a query teplate for the specific type of hua jobs. The query teplate follows the forat of the crowdsourcig platfor, such as AMT, ad should be easily uderstood by hua worers. The egie the traslates each job fro the job aager ito a set of crowdsourcig tass ad publishes the ito the crowdsourcig platfor. To reduce the crowdsourcig cost, the egie eploys a predictio odel, which estiates the uber of required hua worers for a specific tas based o the distributio of worers perforace.

3 Figure 3: Query Teplate. I the secod phase, the hua worers aswers are retured to the crowdsourcig egie, which cobies the results ad reoves the abiguity. A verificatio odel is developed to select the correct aswer based o the probability estiatio. Soeties, the hua tass eed to disclose soe sesitive data to the public. We desig a privacy aager iside the egie to address the proble. The privacy aager ay adaptively chage the forats of the geerated questios for hua worers. It ay also reject soe worers for a specific tas. The perforace of crowdsourcig egie is deteried by the two odels, the predictio odel ad verificatio odel. We shall itroduce the two odels i the followig sectios ad discuss the ipleetatio of two practical applicatios, a twitter setiet aalytics (TSA) job ad a iage taggig (IT) job, to validate the perforace of our odels.. Deployig Applicatios o CDAS I this sectio, we use the TSA job as a ruig exaple to show how to deploy a applicatio o CDAS. TSA job is typically processed usig achie learig ad iforatio retrieval techiques [][]. However, as show i the experietal sectio, CDAS ca achieve a uch higher accuracy tha soe of these traditioal approaches for the TSA job. I the TSA job, the query is forally defied as follows. DEFINITION. Query i TSA The query i TSA follows the forat of (S, C, R, t, w), wheres is a set of eywords, C deotes the required accuracy, R is the doai of aswers, t is the tiestap of the query ad w is the tie widow of the query. For exaple, suppose the user wats to ow the public opiios for iphoe4s fro Oct-4-0 to Oct-3-0, the correspodig query ca be expressed as: Q({iPhoe4S, iphoe 4S}, 95%, {Best Ever, Good, Not Satisfied}, Oct-4-0, 0). The aswer to the query cosists of two parts. The first part is the percetage of each opiio ad the secod part coprises the reasos. For the above query, oe possible aswer is that ost people perceive iphoe4s is a good product thas to the features of Siri ad ios 5, while a saller but sigificat uber of people are ot satisfied with its display ad battery perforace. Table : Users Opiio o iphoe4s Opiios Percetages Reasos Best Ever 60% Siri, ios 5, Perforace Good 0% Siri, 080P Not Satisfied 30% iphoe4, Display, Battery The query defiitio of TSA is registered i the job aager, which the geerates the correspodig processig pla. The progra executor is resposible for retrievig the twitter strea ad checig whether the query eyword (S ip hoe4s i above exaple) exists i a tweet. The cadidate tweets are fed to the crowdsourcig egie, which will geerate a query teplate as show i Figure 3. Whe the crowdsourcig egie collects eough tweets i its buffer, it starts to geerate the HIT (Hua Itelligece Tas). I particular, it creates a HTML sectio (bouded by <div> ad </div>) for each tweet usig the query s teplate. For all the tweets i the buffer, we cocateate their HTML sectios to for our HIT descriptio. Therefore, oe HIT i the TSA job cotais questios for ultiple tweets about the sae product, ovie, perso or evet. The HIT is the published ito the AMT for processig. Algorith suarizes the two-phase query processig i the crowdsourcig egie (ote that Algorith describes the geeral query processig strategy, ot just for the TSA job). I the preprocessig, the egie geerates a HIT job for the tweets usig the query teplate (lie -6). I the first phase, it applies the predictio odel to estiate the uber of worers required to satisfy the predefied accuracy (lie 7). I the secod phase, it subits the HIT to AMT ad waits for the aswers (lie 8-0). The verificatio odel is used to select the correct aswers. I lie 7, Q.C deotes the accuracy requireet specified by query Q. Algorith queryprocessig ( ArrayList<T weet> buffer, Query Q ) : HtlDesc H ew HtlDesc() : for i 0to buffer.size- do 3: Tweet t buffer.get(i) 4: HtlSectio hs ew HtlSectio(Q.teplate(), t) 5: H.cocateate(hs) 6: HIT tas ewhit(h) 7: it predictworernuber(q.c) 8: subit(tas, ) 9: while ot all aswers received do 0: verifyaswer() I Algorith, the two odels direct the whole procedure of query processig, which are also the focus of this paper ad will be preseted i the followig sectios. 3. PREDICTION MODEL 3. Ecooic Model i AMT The predictio odel is desiged to esure high-quality aswers ad to reduce cost. It is highly related to how the crowdsourcig platfor charges the requesters. Therefore, we first briefly itroduce the ecooic odel of AMT. I AMT, a HIT is published ad broadcasted to all cadidate worers. Ay cadidate worer ca accept the tas. Thus, if aswers for a HIT are required, fro the poit of view of CDAS, there will be rado worers providig the aswers. AMT charges CDAS for each HIT usig the followig rules:. Every worer is paid a fixed aout of oey c.. CDAS pays a fixed aout of oey s per worer to the AMT syste for each HIT. Therefore, we sped ( c + s) for each HIT. Tae query Q (S, C, R, t, w) itsaas aexaple, ifwe get K available tweets

4 for each tie uit, the cost of processig Q is ( c + s)kw. I our predicatio odel, the uber of worers is correlated to the required accuracy C. We use fuctio g to deote the relatioship betwee C ad. Cosequetly, the query cost ca be represeted as ( c + s)wk g(c). Before we preset the techical details, we suarize the otatios used i the paper i Table. U u i P A a i μ f(u i) Ω P (r Ω) c i ρ(r i) Table : Table of Notatios the set of worers the i-th worer the uber of worers the probability of at least worers provide the correct aswer the set of accuracy of worers the accuracy of worer u i the ea value of worer accuracy the aswer provided by worer u i the observatio of distributio of aswers the probability of aswer r beig correct uder the observatio Ω the uber of all possible aswers the cofidece of worer u i the cofidece of aswer r i 3. Votig-based Predictio Give ( is odd) aswers fro worers U {u,u,..., u }, the votig strategy accepts a aswer if at least worers retur the sae aswer. While the votig strategy guaratees that o other aswers have ore votes of beig the correct aswer, it however does ot address the proble of how to select. To address the above proble, we propose a votig-based predictio odel. Give a accuracy requireet, the predictio odel estiates the uber of worers required. That is, the goal of the predictio odel is to derive the fuctio g for each query. We prove i Sectio 4 that the odel ca also produce a boud for our probability-based verificatio approach. 3.. A Coservative Estiatio We copute the probability that at least worers provide the correct aswer. We use P to deote the probability. Suppose the accuracy of all worers are A {a,a,,a }, where the accuracy eas the probability of a worer providig a correct aswer. By the defiitio of P, we have the followig equatio: P U U, U ( u i U a i ( a j)) u j U U deotes a subset of user set U with size o saller tha.the above equatio euerates all the possible cases that the correct aswer ca be obtaied by votig. The worers of a HIT ca be cosidered as rado worers fro AMT. Let μ deote the ea value of the worers accuracy. We have the followig theore to copute the expectatio of the probability that at least worers retur the correct aswer: THEOREM. If worers aswer the queries idepedetly, E[P ] ( ) μ ( μ) PROOF. As all worers are radoly piced, a i ad a j are idepedet for ay i j. Siilarly, a i ad a j are also idepedet. Thus, E[P ] E[ ( a j))] E[ U U, U ( u i U a i u j U ( a i U U, U u i U u j U ( ( U U, U u i U ( a j)))] ( E[a i] E[( a j)])) u j U We have E[a i]μ ad E[ a i] μ. Therefore, E[P ] ca be coputed as: E[P ] ( μ ( μ))) ( U U, U u i U ( U U, U u j U μ ( μ) ) ( ) μ ( μ) For a give query, we require E(P ) to be o less tha a give accuracy C, i.e., E(P ) C. Furtherore, we derive a lower boud of E(P ) that ca be easily coputed as follows. THEOREM. E[P ] e (μ ) PROOF. By Cheroff Boud, ( ) μ ( μ) e (μ ) + Moreover, for ay odd, wehave Therefore, E[P ] + ( ) μ ( μ) ( ) μ ( μ) e (μ ) + By requirig e (μ ) C, we guaratee that E[P ] C (i.e., the expected accuracy of the query result is o less tha C). Cosequetly, we obtai a sufficiet coditio for the quality of the crowdsourcig query egie: THEOREM 3. Give required accuracy C ad the ea value of worers accuracy μ, choosig l( C) (μ ) worers esures the expected accuracy of the crowdsourcig result o less tha C. Note that is a odd iteger, so the iiu value of is l( C) 4(μ ) +.

5 3.. Optiizatio with Biary Search Settig to l( C) +esures the expected accuracy of 4(μ ) results. However, it is well ow that Cheroff Boud provides a tight estiatio oly for a large eough. I soe HITs, oly a few worers participate i processig. Therefore, Theore 3 geerates a coservative estiatio that ay cause too ay worers to be ivolved. To address this proble, we use Theore 3 as a upper boud ad apply a biary search algorith (o odd ubers) to fid a tighter estiatio, i.e. the iiu odd that satisfies E[P ] C. Algorith biarysearch(double C) //C is the required accuracy : it s,ite l( C) + 4(μ ) : while s<edo 3: it s+e + 4 4: it E coputeexpectedprob() 5: if E C the 6: e 7: else 8: s + 9: retur e Algorith 3 coputeexpectedprob(it x) : double E0, δμ x : for it ix to x do 3: E E + δ 4: δ δ ( μ)i μ(x i+) 5: retur E Algorith shows the idea of biary search. We iitialize the doai of to be [, l( C) +](lie ). At each step, we 4(μ ) copute the expected accuracy of usig worers (lie 4), util we reach the iiu that satisfies the accuracy requireet. Algorith 3 illustrates the process of coputig the expected accuracy. Its correctess is based o the fact that ( ( ) / ) /( +). Obviously, the tie coplexity of Algorith 3 is O(). Therefore, we ca get a tighter boud of the uber of worers required usig Algorith i O( log ) tie. 3.3 Saplig-based Accuracy Estiatio I the previous two predictio odels, we rely o the statistics of worers accuracy distributio. However, ot all crowdsourcig platfors provide such iforatio due to the privacy issue. Eve if soe platfors provide certai statistics, they caot be directly used as worers accuracy. For exaple, AMT syste records the approval rate of each worer. Approval rate shows the percetage of aswers approved by the requester. However, we have observed that the approval rate is ot cosistet with the accuracy of the worer i CDAS. There are two ai reasos. First, the worer s accuracy ay vary widely across jobs. Secod, soe requesters set autoatic approval for all aswers without verificatio. The differece of approval rate ad accuracy is studied through experiets. To resolve the above proble, we desig a saplig-based approach. Specifically, for a registered query, we radoly ebed questios, whose groud truth are ow beforehad. These questios are used as our testig saples to estiate the worers accuracy. Here we use TSA applicatio to illustrate the saplig ethod. As etioed previously, each HIT cotais the questios of B tweets. To get ubiased results, we radoly iject αb saples Algorith 4 dosaplig(hit H) : WorerSet UH.getWorers() : Double[] rate ew Double[U.size] 3: while H.extQuestio() ull do 4: Questio q H.getNextQuestio() 5: if q is a testig saple the 6: for i 0to U.size do 7: Worer u U.get(i) 8: if u.getaswer(q)q.groudtruth the 9: rate[i] rate[i]+ αb.size ito a HIT. I other words, each HIT has αb testig saples ad ( α)b ew tweets. I our curret ipleetatio, α ad B are set to 0. ad 00, respectively. We evaluate the effect of saplig rate α i our experiets, ad the results cofir that eve a low saplig rate ca produce a acceptable estiatio. I the saplig process, CDAS collects the accuracy of participatig worers. Algorith 4 shows the procedure. After the saplig, the statistics are used i both the predictio odel ad the verificatio odel. 4. VERIFICATION MODEL I the votig-based verificatio, if ore tha half of the worers retur the sae aswer, the query egie will accept it as the correct aswer. Despite the fact that our predicatio odel tries to guaratee that at least half of the worers subit the correct aswer, the votig-based verificatio occasioally fails to provide a aswer. For a specific questio, differet worers ay provide differet aswers, ad i soe cases, o aswer gets a agreeet above 50%. Moreover, the votig strategy assues that all the worers provide the correct aswer with the sae probability, which is ot true as the accuracy of differet worers varies a lot ad the worers with higher accuracy are ore trustable. I this sectio, we propose a probability-based verificatio ethod to deterie the best aswer. 4. Probability-based Verificatio Probability-based verificatio tries to evaluate the quality of aswers through worers historical perforaces (i.e. accuracy). I particular, give the probability distributio of worers perforaces, we apply the Bayesia theore to estiate the accuracy of each result. We adopt ad exted the approach proposed i the data fusio [4] for itegratig coflictig results i the CDAS. Suppose a HIT is aswered by worers {u,u,,u } with accuracy {a,a,,a }. Wedefie fuctio f(u i) to represet the aswer provided by worer u i. Based o Bayesia aalysis, the probability of a specific aswer r R beig the correct aswer give the observatio of the aswer s distributio Ω (i.e. the aswers provided by worers) ca be coputed as: P ( r Ω) P (Ω r)p ( r) P (Ω) P (Ω r)p ( r) P (Ω ri)p (ri) r i R Suppose the size of the aswer doai R. Without a priori owledge, each aswer r i R appears with equal probability of. The the above equatio ca be trasfored ito: P ( r Ω) P (Ω r) r i R P (Ω ri) ()

6 Let r be the correct aswer. The probability for worer u j providig the correct aswer is a j (i.e. accuracy). Without ay priori owledge, each icorrect aswer provided by u j appears with equal probability a j. Therefore, P (Ω r) ca be coputed as: P (Ω r) f(u j ) r a j Cobiig Equatio ad, we have P ( r Ω) f(u j ) r f(u j ) r aj a j f(u j ) r a j r i R ( f(u j )r i a j f(u j ) r i a j f(u j ) r ( )a j a j r i R ( f(u j )r i ( )a j a j ) ) For ease of illustratio, we defie the Worer Cofidece for a aswer as follows. DEFINITION. Worer Cofidece Let a j be the accuracy of worer u j.thecofidece c j of worer u j is defied as: c j l ( )aj a j l( ) + l a j a j Fro the above defiitio, we ca see that high-accuracy worers will get large cofidece values. This is cosistet with the ituitio that worers with higher accuracy are ore trustable. Basedothedefiitio of worer cofidece ad the equatio 3, we defie the Aswer Cofidece as below. DEFINITION 3. Aswer Cofidece The cofidece of a aswer r equals to the probability of r beig the correct aswer: ρ( r) P ( r Ω) f(u e j ) r c j r i R (e f(u j )r c i j ) I our CDAS, the aswer with the highest cofidece is accepted as the fial result. I fact, the cofidece of a aswer represets a variat of votig, where e c j is used as the weight for worer u j. Apparetly, the worer with a higher cofidece gets ore weight. To speed up the coputatio of P ( r Ω), we cache the value l a j a j for each ow worer. We ca prove that usig Theore to estiate the uber of worers required also produces a qualityboud for our probabilitybased verificatio approach. THEOREM 4. If E[P ] C ad let r be the correct aswer, we have that our probability-based verificatio odel returs r as the result with a probability o less tha C. PROOF. Based o Theore, ( ) E[P ] μ ( μ) C Naely, the expected uber of worers, who provide the correct aswer, is larger tha with a probability larger tha C. The cofideces of all worers are idepedet ad idetically distributed () (3) (4) (i.i.d.), because the accuracies of the worers are i.i.d. Let E c deote the ea value of worers cofideces. As a result, the total uber of expected votes for aswer r is E[ f(u j ) r c j] f(u j )r E[c j] E c {u j f(u j)r} > Ec Note that i Equatio 4, all aswers share the sae deoiator. f(u The value of P ( r Ω) is proportioal to e j ) r cj. Thus, r is the aswer with the largest expected cofidece ad is retured as the result i expectatio. Otherwise, if aother aswer r has a larger expected probability tha r, i.e., Therefore, We will have E[ E[ E[P (r Ω)] >E[P ( r Ω)] f(u j )r c j] >E[ f(u j )r c j]+e[ f(u j ) r f(u j ) r c j] > Ec c j] >E c I fact, the su of worers cofideces is equal to the su of cofideces for every aswer: E c E[ ( c j)] r i R p f(u j )r i This results i a cotradictio that the su of cofideces of r ad r exceeds the su of all cofideces. Therefore, our probabilitybased verificatio odel returs r as the result with a probability o less tha C. The oly uow paraeter i Equatio 4 is, the size of R. We ca siply set R. However, i our experietal study, we have foud that ot all aswers i R are piced by the worers. For exaple, if a questio ass a worer to ra a product based o soe tweets ad the score rages fro 0 to 00, the scores will follow a very sewed distributio. Soe low-probability aswers are ever selected, but they do reduce the weight of a correct aswer. Thus, we eed to select a good to prue the oise. After a HIT copletes, the crowdsourcig egie gets distict aswers for a specific questio fro worers ( ). I this observatio, we select distict aswers aog possible oes. The probability of this selectio ca be coputed as ( ). Suppose this is ot a very rare observatio ad the probability of this observatio is larger tha ɛ (e.g., we prue the low-probability oise). The followig lea provides a lower boud for. LEMMA. > H ( )(ɛ) is the -th Haroic uber.,whereh i i

7 PROOF. ( ) ɛ < ( ) ( +) ( ) ( )( ) ( ) ( ) ( )( ) ( ) ( ( ( i ))) i ( H ) Derived fro the above equatio, we have Therefore (ɛ) < H > H (ɛ) H ( )(ɛ) For a large, the above lower boud is too loose. Istead, we propose a tighter lower boud for : LEMMA. > ɛ e PROOF. Fro Lea, we have Therefore, Obviously, i ɛ < By settig l THEOREM 5. l ɛ< i ( i ) i i i l i i l i i >l > l ɛ, we get a tighter boud: > ɛ e >ax{ H ( )(ɛ) PROOF. Directly fro Lea ad., ɛ e } I our verificatio, we set ɛ to 0.05 based o Fisher s exact test [5], which is widely adopted i practice. We the use Theore 5 to estiate the value of. We ow give a exaple i TSA to show the beefit of applyig our probability-based verificatio odel. Table 3 shows the Table 3: A Exaple of Worers Aswers Movie Title Gree Later Tweet Oh. My. GOD. Gree Later ovie is terrible. Lie, Lost I Space ovie terrible. Worer ID w w w 3 w 4 w 5 Accuracy Aswer pos pos eu eg pos Table 4: Results of Verificatio Models pos eu eg Aswer Half-Votig 3 pos Majority-Votig 3 pos Verificatio eg exaple. Five worers with differet accuracies provide three differet aswers, aely Positive, Neutral ad Negative. The results of the three verificatio odels are show i Table 4. Both the Half-Votig odel ad the Majority-Votig odel choose Positive as the results sice three worers out of five provide the aswer Positive. However, our verificatio odel ca correctly choose Negative as the result because the worer aswerig Negative has a uch higher accuracy. As a result, our verificatio odel gets ore accurate aswers tha the other two votig-based odels. 4. Olie Processig The worers subit their aswers asychroously i the AMT ad CDAS has to wait for sufficiet uber of aswers to be subitted. As a cosequece, query respose tie i CDAS (ad other crowdsourcig systes for that atter) is expected to be loger tha that of o-crowdsourcig systes. To alleviate such a proble ad also to iprove users experiece, we adopt olie processig techiques i CDAS. Istead of waitig for all worers to coplete their tass, CDAS provides a approxiate result based o the aswers received so far. As we have previously discussed, ucertaity ad approxiatio caot be avoided i crowdsourcig systes, which aes olie processig a perfect fit for the query processig i CDAS. To resolve the ucertaity, we exted the techiques of data fusio [4][4] to estiate the aswer s cofidece. However, the sae approach caot be directly applied to the olie processig i CDAS, as i the crowdsourcig systes, the hua worers copete for the tass ad CDAS does ot have the profile (i.e. accuracy) for a specific user util he/she returs the aswer. I our case, the accuracy of the aswer provided by a usee worer ca oly be estiated by the distributio of all worers accuracies. 4.. Fidig the Correct Aswer Olie We apply Equatio 4 to cotiuously update the probability of each received aswer. Suppose a HIT is assiged to worers ad the query egie receives aswers fro ( <) worers. Ulie Equatio 4, i this case, we oly receive a partial observatio Ω for the aswer distributio. For the reaiig worers, we have o idea about what aswers they ay provide. Let s deote a possible aswer set by the reaiig worers ad we use S to represet all the possible s. Let A {a +,a +,..., a } be the accuracies of the reaiig worers. As we do ot ow the idetities of the reaiig worers, we cosider all the possibilities. We use A to represet all the possible perutatios of A. The cofidece of a aswer r beig the correct oe ca be estiated as the expected probability P (r Ω,s) over S ad A, i.e., ρ(r) E s S,A A [P (r Ω,s)]

8 The followig theore shows that Equatio 4 ca be applied to copute ρ(r). THEOREM 6. Assue that worers process the query idepedetly ad the aswers are subitted i a rado order. ρ(r) P (r Ω ) PROOF. Based o the assuptio, we have: ρ(r) E s S,A A [P (r Ω,s)] E A A [E s S[P (r Ω,s)]] I fact, the aswer set of the reaiig worers s does ot affect the coputatio of the above equatio. As show i [4], E s S[P (r Ω,s)] P (r Ω ) The coputatio of ρ(r) ca be further siplified as: ρ(r) E A A [E s S[P (r Ω,s)]] E A A [P (r Ω )] P (r Ω ) Figure 4: Reviews for Kug Fu Pada Theore 6 shows that the cofidece of a partial result ca also be coputed by Equatio 4. Therefore, we select the aswer with axial cofidece as our correct aswer. 4.. Early Teriatio Whe the curret aswers are good eough, we ca teriate the HIT to reduce cost. The ajor challege of early teriatio is how to easure the quality of the curret results. Ituitively, we ca stop acceptig aswers fro ew worers as soo as we are sure that the curret result r will ot chage by the aswers we choose to forgo. I particular, let r ad r be the best ad secod best aswers basedotheircofidece, respectively. We have P (r Ω ) > P (r Ω ).Let(u,u,..., u ) be the set of worers. Suppose worers have subitted aswers ad aswers reai ufilled. Assue a aswer set s {f(u i)r + i }. Usig siilar techiques i paper [4], we ca prove the theore of iial possible value of P (r Ω) ad the axial possible value of P (r Ω): i P (r Ω) P (r Ω,s) (5) ax P (r Ω) P (r Ω,s) (6) Note that i P (r Ω) ad ax P (r Ω) are related to the rado variables a +,a +,,a. I our algorith, we use the expected value of i P (r Ω) ad ax P (r Ω), aely, E A A [i P (r Ω)] ad E A A [ax P (r Ω)]. However,itisdifficult to copute the expected values directly. Therefore, i practice, we use the approxiate values of E A A [i P (r Ω)] ad E A A [ax P (r Ω)]. We assue every reaiig worer has the sae accuracy E[a i] ad use it i the Equatio 5 ad 6. Epirical results show that the approxiatios wor well i practice. We propose three differet strategies as the teriatio coditio: MiMax E A A [i P (r Ω)] >E A A [ax P (r Ω)] MiExp E A A [i P (r Ω)] >P(r Ω ) ExpMax P (r Ω ) >E A A [ax P (r Ω)] MiMax guaratees that the aswer output by our syste is stable whe the teriatio coditio is achieved. However, it is too coservative. MiExp ad ExpMax ca teriate the processig uch earlier, but ay lead to low-quality results. We study the effect of the three strategies i our experiets. Algorith 5 olieprocessig(questio q) : Set aswerew Set() : Map< Aswer,float>resultewMap() 3: while ot all aswers are retured do 4: Aswer A getnextaswer(q) 5: aswer.add(a) 6: Set distictaswer getdistictaswer(aswer) 7: for i 0todistictAswer.size- do 8: Aswer A distictaswer.get(i) 9: float cofidece coputecofidece(a) 0: result.put(a, cofidece) : if cateriate(result) the : brea 3: retur result Algorith 5 outlies the olie processig strategy adopted i CDAS. The query egie cotiuously updates the cofidece of each aswer (lie 3-3) util the teriatio coditio is satisfied. We apply Equatio 4 to estiate the cofidece of each aswer (lie 9) ad apply oe of the three teriatio strategies to decide whether to stop the processig (lie ). 4.3 Result Presetatio I the olieprocessig Algorith (Algorith 5), if there is a aswer that eets the teriatio coditio, olie processig will stop ad CDAS will accept the aswer. Otherwise, if oe of the aswers is good eough, CDAS will update the cofidece of each aswer accordig to Equatio 4. We tae queries i TSA as a exaple to illustrate the result presetatio. Give a list of tweets t,t,..., t N, let fuctio h ti (r) retur the score of aswer r for tweet t i. h ti (r) is defied as follows: if r is accepted for t i h ti (r) 0 if aother aswer is accepted ρ ti (r) oe of the aswers are accepted

9 The percetage of aswer r is the coputed as N N i ht i (r). Moreover, we geerate a set of eywords as reasos for each aswer r. These eywords are the ost frequet eywords subitted by the worers who have provided the aswer r. The results are updated as ew tweets are beig streaed ito TSA. Figure 4 shows the olie processig iterface of TSA for the review results of Kug Fu Pada. It suarizes Twitter users opiios ito three categories. The tie widow of the query is set to iutes ad i the elapsed tie (4 iutes), 0 tweets are fed to TSA, aog which 70% of tweets say Kug Fu Pada is a good ovie. TSA updates the result upo ew tweets arrivig. Users ca clic a aswer to expad the view. TSA will list the correspodig tweets for the aswer. The tweets are sorted based o tiestaps fro the ewest to the oldest. The user ca also chec the progress of the curret ruig HIT. 5. PERFORMANCE EVALUATION To evaluate the effectiveess of the quality-sesitive aswerig odel i CDAS, we developed two crowdsourcig applicatios, a twitter setiet aalytics (TSA) job ad a iage taggig (IT) job. We preset the coprehesive experietal results over TSA, ad due to the space costrait, we shall oly provide the copariso with a olie iage taggig toolit for the IT applicatio. The results for the other experiets over IT exhibit siilar treds to those of TSA. By default, our approach applies the probability-based verificatio odel (deoted as Verificatio) to select the best aswer. For copariso, the Half-Votig ad Majority-Votig odels are used as two alterative verificatio approaches. Suppose ( is odd) worers are eployed for a particular tas. I the Half-Votig odel, the aswer r i is accepted oly if o less tha worers retur it as their aswers. I the Majority-Votig odel, let v(r i) deote the votes for aswer r i. The aswer r i is accepted if for ay other aswer r j, v(r i) >v(r j). 5. Applicatio : TSA We deploy TSA o AMT ad use 00 ovie titles as our queries. The selected titles are the ost recet ovies listed i IMDB (Iteret Movie Database). The query follows the forat of Q({ovie ae}, accuracy requireet, {Positive, Neural, Negative}, Oct- -0, day). Naely, the queries are processed agaist oe-day tweets. For each HIT, 30 worers are eployed to perfor the review categorizatio tas. We aually chec each of the reviews to geerate our groud truth. 5.. Crowdsourcig vs. SVM Algorith We first show the advatages of crowdsourcig techiques over coputer progras. We copare the results of TSA with LIB- SVM. To build a autoatic classificatio odel usig LIBSVM, tweet reviews about five ovies are selected as the test data, ad tweets about the rest 95 ovies are used as traiig data. After a strea of tweets passes the filters of TSA, we also sed it to LIB- SVM ad collect the correspodig results. We the copare the results agaist our groud truth. I TSA, we vary the uber of worers fro to 5. Figure 5 shows the accuracies of both systes for five ovies, each with 00 tweet reviews. I ost cases, TSA ca achieve a higher accuracy tha LIBSVM, eve if oly oe worer is eployed. This idicates that huas are uch better at atural laguage uderstadig tha achies. For such tass, if high accurate results are required, crowdsourcig is a proisig approach. cjli/libsv/ 5.. Accuracy Aalysis I TSA, we first apply Theore to estiate the uber of worers required. This is a coservative estiatio. To reduce cost, biary search is used to refie the estiatio. Figure 6 copares the coservative estiatio with the refied estiatio geerated by the biary search. We chage the user required accuracy fro 0.65 to 0.99 ad fid that the refied estiatio is less tha half of the coservative estiatio. I the reaiig experiets, we use the refied estiatio to deterie the uber of worers required for each HIT. We ext preset the accuracy for the three verificatio odels, aely Half-Votig, Majority-Votig ad our proposed Probabilitybased Verificatio odel. Figure 7 shows that whe the uber of worers icreases, we ca get a higher accuracy. Aog the three verificatio odels, our probability-based approach achieves a uch higher accuracy tha the other two. Whe 9 worers are eployed, the probability-based odel iproves the accuracy to This verifies the beefit of cosiderig worers historical perforace. We proceed to ivestigate the effectiveess of the three verificatio odels with respect to a user required accuracy. Figure 8 shows the result. Whe the requester specifies a required accuracy, TSA estiates the uber of worers eeded to achieve that accuracy. The real accuracy is coputed by coparig the worers aswers with the groud truth. The red lie i the figure deotes the user required accuracy. We observe that the probability-based verificatio odel always provides a satisfactory result while the results of the other two odels are below the required accuracy i ost cases. We ca observe that the accuracy of the Half-Votig odel is worse tha our estiatio. The reaso is as follows. First, the estiated uber of worers ties to users ea accuracy. The ea accuracy used i the predictio odel is a overall accuracy, which is collected across various questios. However, for soe difficult questios, worers accuracies could be uch lower. As a result, the uber of worers eeded i votig odels is ore tha the estiated uber. For exaple, the followig tweet about ovie The Last Airbeder expresses a positive opiio whereas ost worers classify it ito the egative category because of the word sucs. My ephew just said that Avatar: The Last Airbeder sucs... I disowig hi. The secod reaso ca be explaied based o the results of Figure 9 ad Figure 0. Figure 9 shows the percetage of tweets with o aswers i the two votig-based odels. I soe cases, the Half-Votig ad Majority-Votig odels fail to provide a result as oe of the aswers is discriiative (All aswers get o ore tha half votes or ore tha oe aswers get the sae uber of votes). Whe the uber of worers icreases, Majority-Votig ca solve the tie ore easily. However, for the Half-Votig strategy, there are still about 5% of the tweets that caot obtai aswers with ore tha half the aout of votes. I Figure 0, whe we vary the uber of tweet reviews, we observe that the percetage of o-aswer reviews is fairly stable. This pheoeo idicates that the reviews with o-discriiative aswers are alost uiforly distributed aog all reviews Olie Processig Oe advatage of our crowdsourcig egie is its ability i supportig olie processig. It ca provide a approxiate result without waitig for all the worers to fiish their jobs. Specifically,

10 Accuracy District9 SocialNetwor Thor GreeLater Rooate Movie title LIBSVM TSA worer TSA 3 worers TSA 5 worers # of worers eeded Coservative Biary User required accuracy Real accuracy Majority-Votig Half-Votig Verificatio # of worers Figure 5: Crowdsourcig vs. SVM Algorith Figure 6: Nuber of Worers Required Figure 7: Accuracy Copariso wrt. Nuber of Worers Real accuracy No aswer ratio 5% 0% 5% 0% Majority-Votig Half-Votig No aswer ratio 5% 0% 5% 0% Majority-Votig Half-Votig 0.7 Majority-Votig Half-Votig 0.65 Verificatio User required accuracy 5% 0% # of worers 5% 0% # of reviews Figure 8: Accuracy Copariso wrt. User Required Accuracy Figure 9: Percetage of No-Aswer Reviews wrt. Nuber of Worers Figure 0: Percetage of No-Aswer Reviews wrt. Nuber of Reviews TSA will geerate a iitial result as soo as the first aswer is retured. The it will gradually refie the results as ore aswers arrive util the teriatio coditio is satisfied.thisallowsusto teriate a HIT ad cap the processig cost 3. Oe iterestig observatio i our experiets is that the accuracy of the approxiate result varies sigificatly for differet aswer arrivig sequeces. Figure shows the accuracy of the sae HIT uder four differet aswer sequeces. The red lie is the user-required accuracy Sequece 4 results i a low startig accuracy because the first two worers of sequece 4 provide icorrect aswers. Therefore, i olie processig, we ust update the cofidece of the curret result dyaically based o the aswers received as early teriatio ay potetially degrade the accuracy. We evaluate the three teriatio strategies as discussed i Sectio 4... Figure shows the effect of early teriatio o the uber of worers. The red lie deotes the estiated uber of worers via our refied predictio odel. The MiMax strategy geerates the ost coservative estiatio, but it still reduces the uber of worers by 0%. The ExpMax strategy is the ost aggressive oe, which ca save ore tha 50% of worers. I Figure 3, we show the accuracies of the differet teriatio strategies. The x-axis is the accuracy requireet specified by the user ad the y-axis is the real accuracy easured agaist the groud truth. We ca see that the MiMax ad ExpMax strategies satisfy the user required accuracy (deoted as red lie) i all cases while MiExp fails to eet the requireet at a few poits. I view of the eed for reducig the uber of worers while aitaiig good accuracy, we propose to adopt the ExpMax teriatio strategy. 3 I AMT, we ca cacel a HIT whe we detect that the aswers are good eough. By doig so, we do ot eed to pay worers who have yet subitted their aswers Effect of Saplig TSA verifies the aswers usig the probability-based verificatio odel, which relies o worers historical perforace. The AMT syste records a approval rate for each worer, which iplies his accuracy i geeral. However, the worers approval rates are ot public due to privacy cocers. To collect the statistics, we publish 500 HITs requirig worers to fill i their approval rate. We also copute the worers accuracies of aswerig TSA queries. We observe the distributio of their approval rate i AMT is very differet fro that of real accuracy i TSA, as show i Figure 4. The reasos are two-fold. O oe had, there are various types of tass i AMT ad it is atural that people caot be experts i all doais. O the other had, soe requesters set autoatic approval for all worers without checig the aswers. This results i a high average approval rate i AMT. Therefore, we adopt a saplig approach to estiate worers accuracy. Give wors, we copute their accuracies A j {a j,aj,..., a j } uder a saplig rate j%. Wevarythesapligratead plot the ea accuracy μ j ad average absolute error err j i Figure 5, where μ j ad err j are defied as follows: μ j a j i, errj i i a j i a00 i As show, both ea accuracy ad average error are stable whe the saplig rate is higher tha 0%. More precisely, ea accuracy reais early costat ad average error approaches 0. We also study the effect of saplig rate o accuracy i our verificatio odel. Figure 6 plots the result. We vary the saplig rate fro 5%, 0% to 0% ad copare the result to 00%- saplig accuracy. The red lie represets the user required accuracy. We ca see that the verificatio has a better accuracy with a higher saplig rate. Whe the user required accuracy is lower

11 Real accuracy Sequece Sequece Sequece 3 Sequece # of aswers arrived # of worers eeded MiExp MiMax ExpMax User required accuracy Real accuracy MiExp MiMax 0.65 ExpMax User required accuracy Figure : Effect of Aswer Arrivig Sequece Figure : Effect of Early Teriatio o Worer Nuber Figure 3: Effect of Early Teriatio o Accuracy Percetage of worers 60% 50% 40% 30% 0% 0% Real accuracy Approval rate Mea accuracy Average error Real accuracy Rate5% Rate0% Rate5% Rate0% Rate00% 0% Saplig rate (%) User required accuracy Figure 4: Worer Accuracy vs. Approval Rate Figure 5: Effect of Saplig Rate o Worer Accuracy Figure 6: Effect of Saplig Rate o Verificatio Accuracy tha 0.75, all saplig rates are satisfactory. The result eets all of the user required accuracy oly with a saplig rate o less tha 0%. Moreover, the accuracy uder 0% saplig rate has oly a sall gap copared to that uder 00% saplig. We use 0% saplig rate i all of our verificatio experiets. 5. Applicatio : IT I this experiet, we evaluate our odel i the cotext of iage taggig applicatio. We use 00 Flicer iages as our queries. For each iage, we give a set of cadidate tags ad let 30 worers to choose the related oes. The cadidate tags iclude Flicer tags ad soe ebedded oise tags. Agai, we first show the advatages of crowdsourcig over the applicatios o dealig with iage taggig tas. We copare our result with ALIPR 4. ALIPR[3] is a autoatic iage aotatio syste which applies -D Hidde Marov odel ad clusterig techiques. The accuracy copariso result is show i Figure 7. We use 5 groups of iages. Each group cotais top 0 Flicer iages retured by a tag. The figure clearly shows the accuracy gap betwee ALIPR ad crowdsourcig approach. ALIPR achieves its best accuracy 30% o tag su ad has oly.6% accuracy o tag apple, whereas i our crowdsourcig syste, we ca reach ore tha 80% eve with oly oe worer eployed. We ext study the effectiveess of our odel. Recall that our odel first estiates the uber of worers for a specified accuracy requireet ad the applies a probability-based odel to verify the result. Figure 8 shows the accuracy achieved with respect to the user required accuracy. As before, the red lie deotes the user required accuracy. It ca be see fro the figure that our odel ca always satisfy user s requireet RELATED WORK The eergece of Web.0 systes has sigificatly icreased the applicability ad usefuless of crowdsourcig techiques. A coplex job ca be split ito ay sall tass ad assiged to differet olie worers. Aazo s AMT ad CrowdFlower 5 are popular crowdsourcig platfors. Studies show that users exhibit differet behaviors i such icro-tas arets []. A good icetive odel is required i tas desig [0]. Recetly, crowdsourcig has bee adopted i software developet. Istead of aswerig all requests with coputer algoriths, soe hua-expert tass are published o crowdsourcig platfors for hua worers to process. Typical tass iclude iage aotatio [][8], iforatio retrieval [][8] ad atural laguage processig [3][][7]. These are tass that eve state-ofthe-art techologies caot accoplish with satisfactory accuracy, but could be easily ad correctly doe by huas. Crowdsourcig techiques have also bee itroduced ito the database desig. Qur [6][5] ad CrowdDB [6] are two exaples of databases with crowdsourcig support. I these database systes, queries are partially aswered by AMT platfor. Our syste, CDAS, adopts a siilar desig. O top of the crowdsourcig database, ew query laguages, such as hquery [0], have bee proposed, which allows users to exploit the power of crowdsourcig. Other database applicatios, such as graph search [9], ca be ehaced with crowdsourcig techiques as well. Oe ai obstacle that prevets eterprise-wide deployet of crowdsourcig-based applicatios is quality cotrol. Hua worers behaviors are upredictable, ad hece, their aswers ay be arbitrarily bad. To ecourage the to provide high-quality aswers, oetary rewards are required. Muro et al. [7] showed how to desig a good icetive odel to optiize worers participatio 5

12 Accuracy Real accuracy apple bride flyig su twilight 0.9 Subject Figure 7: Crowdsourcig vs. ALIPR ALIPR worer 3 worers 5 worers User required accuracy Figure 8: Accuracy Obtaied wrt. User Required Accuracy ad cotributios. Ipeirotis et al. [9] preseted a schee to ra the qualities of worers while Ghosh et al. [7] tried to accurately idetify abusive cotet. Ulie previous efforts, i this paper, we have desiged a feasible odel that balaces oetary cost ad accuracy, ad proposed a crowdsourcig query egie with quality cotrol. Oe of the ai challeges of our query egie is how to itegrate the coflictig results of hua worers. The siilar proble has bee well studied i the data fusio systes, for exaples [4][4]. We exteded the odels proposed i [4][4] to select ad verify the crowdsourcig results i our CDAS. 7. CONCLUSION Crowdsourcig techiques allow applicatio developers to haress the atural expertise of hua worers to perfor coplex tass that are very challegig for coputers. However, as huas are proe to errors, there is o guaratee for the results of crowdsourcig. I this paper, we itroduced the quality-sesitive aswerig odel i our Crowdsourcig Data Aalytics Syste, CDAS. The odel guides the query egie to geerate proper query plas based o the accuracy requireet. It cosists of two sub-odels, the predictio odel ad the verificatio odel. The predictio odel estiates the uber of worers required for a specific tas while the verificatio odel selects the best aswer fro all retured oes. To iprove users experiece, whe verifyig the results, our odel ebraces olie processig techiques to update aswers gradually. By adoptig the odels, CDAS ca provide high-quality results for differet crowdsourcig jobs. I this paper, we have ipleeted a twitter setiet aalytics job ad a iage taggig job o CDAS. We used real Twitter data ad Flicr data as our queries. Aazo Mechaical Tur was eployed as our crowdsourcig platfor. The results show that our proposed odel ca provide high-quality aswers while eepig the total cost low. 8. ACKNOWLEDGEMENT The wor of this paper was i part supported by Sigapore MDA grat R REFERENCES [] O. Aloso, D. E. Rose, ad B. Stewart. Crowdsourcig for relevace evaluatio. I SIGIR Foru, 4():9 5, 008. [] J. Bolle, A. Pepe, ad H. Mao. Modelig public ood ad eotio: Twitter setiet ad socio-ecooic pheoea. I CoRR, abs/09.583, 009. [3] C. Calliso-Burch ad M. Dredze. Creatig speech ad laguage data with aazo s echaical tur. I NAACL HLT Worshop, pages, 00. [4] X. L. Dog, L. Berti-Equille, ad D. Srivastava. Itegratig coflictig data: The role of source depedece. I PVLDB, ():550 56, 009. [5] R. Fisher. Statistical ethods for research worers. Oliver ad Boyd, 954. [6] M. J. Frali, D. Kossa, T. Krasa, S. Raesh, ad R. Xi. Crowddb: aswerig queries with crowdsourcig. I SIGMOD, pages 6 7, 0. [7] A. Ghosh, S. Kale, ad P. McAfee. Who oderates the oderators?: crowdsourcig abuse detectio i user-geerated cotet. I EC, pages 67 76, 0. [8] C. Grady ad M. Lease. Crowdsourcig docuet relevace assesset with echaical tur. I NAACL HLT Worshop, pages 7 79, 00. [9] P. G. Ipeirotis, F. Provost, ad J. Wag. Quality aageet o aazo echaical tur. I SIGKDD Worshop, pages 64 67, 00. [0] G. Kazai, J. Kaps, M. Koole, ad N. Milic-Fraylig. Crowdsourcig for boo search evaluatio: ipact of hit desig o coparative syste raig. I SIGIR, pages 05 4, 0. [] A. Kittur, E. H. Chi, ad B. Suh. Crowdsourcig user studies with echaical tur. I SIGCHI, pages , 008. [] J. Ledlie, B. Odero, E. Miov, I. Kiss, ad J. Polifroi. Crowd traslator: o buildig localized speech recogizers through icropayets. I SIGOPS Oper. Syst. Rev., 43(4):84 89, 00. [3] J. Li ad J. Z. Wag. Real-tie coputerized aotatio of pictures. I IEEE Tras. Patter Aal. Mach. Itell., 30(6):985 00, Jue 008. [4] X. Liu, X. L. Dog, B. C. Ooi, ad D. Srivastava. Olie data fusio. I PVLDB, 4():93 943, 0. [5] A. Marcus, E. Wu, D. R. Karger, S. Madde, ad R. C. Miller. Deostratio of qur: a query processor for huaoperators. I SIGMOD, pages 35 38, 0. [6] A. Marcus, E. Wu, S. Madde, ad R. C. Miller. Crowdsourced databases: Query processig with people. I CIDR, pages 4, 0. [7] R. Muro, S. Bethard, V. Kupera, V. T. Lai, R. Melic, C. Potts, T. Schoebele, ad H. Tily. Crowdsourcig ad laguage studies: the ew geeratio of liguistic data. I NAACL HLT Worshop, pages 30, 00. [8] S. Nowa ad S. Rüger. How reliable are aotatios via crowdsourcig: a study about iter-aotator agreeet for ulti-label iage aotatio. I MIR, pages , 00. [9] A. Paraeswara, A. D. Sara, H. Garcia-Molia, N. Polyzotis, ad J. Wido. Hua-assisted graph search: it s oay to as questios. I PVLDB, 4(5):67 78, 0. [0] A. G. Paraeswara ad N. Polyzotis. Aswerig queries usig huas, algoriths ad databases. I CIDR, pages 60 66, 0. [] C. Rashtchia, P. Youg, M. Hodosh, ad J. Hoceaier. Collectig iage aotatios usig aazo s echaical tur. I NAACL HLT Worshop, pages 39 47, 00. [] R. V. Wazeele, K. Verbeec, A. Vorsteras, T. Tourwe, ad E. Tsiporova. Extractig eotios out of twitters icroblogs. I BNAIC, pages 304 3, 0. [3] T. Ya, V. Kuar, ad D. Gaesa. Crowdsearch: exploitig crowds for accurate real-tie iage search o obile phoes. I MobiSys, pages 77 90, 00.