CDAS: A Crowdsourcing Data Analytics System

Size: px
Start display at page:

Download "CDAS: A Crowdsourcing Data Analytics System"

Transcription

1 CDAS: A Crowdsourcig Data Aalytics Syste Xua Liu,MeiyuLu, Beg Chi Ooi, Yaya She,SaiWu, Meihui Zhag School of Coputig, Natioal Uiversity of Sigapore, Sigapore College of Coputer Sciece, Zhejiag Uiversity, Hagzhou, P.R. Chia {liuxua, lueiyu, ooibc, sheyaya, hzhag}@cop.us.edu.sg, wusai@zju.edu.c ABSTRACT Soe coplex probles, such as iage taggig ad atural laguage processig, are very challegig for coputers, where eve state-of-the-art techology is yet able to provide satisfactory accuracy. Therefore, rather tha relyig solely o developig ew ad better algoriths to hadle such tass, we loo to the crowdsourcig solutio eployig hua participatio to ae good the shortfall i curret techology. Crowdsourcig is a good suppleet to ay coputer tass. A coplex job ay be divided ito coputer-orieted tass ad hua-orieted tass, which are the assiged to achies ad huas respectively. To leverage the power of crowdsourcig, we desig ad ipleet a Crowdsourcig Data Aalytics Syste, CDAS. CDAS is a fraewor desiged to support the deployet of various crowdsourcig applicatios. The core part of CDAS is a quality-sesitive aswerig odel, which guides the crowdsourcig egie to process ad oitor the hua tass. I this paper, we itroduce the priciples of our quality-sesitive odel. To satisfy user required accuracy, the odel guides the crowdsourcig query egie for the desig ad processig of the correspodig crowdsourcig jobs. It provides a estiated accuracy for each geerated result based o the hua worers historical perforaces. Whe verifyig the quality of the result, the odel eploys a olie strategy to reduce waitig tie. To show the effectiveess of the odel, we ipleet ad deploy two aalytics jobs o CDAS, a twitter setiet aalytics job ad a iage taggig job. We use real Twitter ad Flicr data as our queries respectively. We copare our approaches with state-of-the-art classificatio ad iage aotatio techiques. The results show that the hua-assisted ethods ca ideed achieve a uch higher accuracy. By ebeddig the qualitysesitive odel ito crowdsourcig query egie, we effectively reduce the processig cost while aitaiig the required query aswer quality.. INTRODUCTION Crowdsourcig is widely adopted i Web.0 sites. For exaple, Wiipedia beefits fro thousads of subscribers, who cotiually write ad edit articles for the site. Aother exaple is Figure : Crowdsourcig Applicatio Yahoo! Aswers, where users subit ad aswer questios. I Web.0 sites, ost of the cotets are created by idividual users, ot service providers. Crowdsourcig is the drivig force of these web sites. To facilitate the developet of crowdsourcig applicatios, Aazo provides the Mechaical Tur (AMT) platfor. Coputer prograers ca exploit AMT s API to publish jobs for hua worers, who are good at soe coplex jobs, such as iage taggig ad atural laguage processig. The collective itelligece helps solve ay coputatioally difficult tass, thereby iprovig the quality of output ad users experiece. Figure illustrates the idea of usig crowdsourcig techiques to divide up jobs. CrowdDB [6], HuaGS [9] ad CrowdSearch [3] are recet exaples of applicatios o Aazo s AMT crowdsourcig platfor. Crowdsourcig relies o hua worers to coplete a job, but huas are proe to errors, which ca ae the results of crowdsourcig arbitrarily bad. The reaso is two-fold. First, to obtai rewards, a alicious worer ca subit rado aswers to all questios. This ca sigificatly degrade the quality of the results. Secod, for a coplex job, the worer ay lac the required owledge for hadlig it. As a result, a icorrect aswer ay be provided. To address the above probles, i AMT, a job is split ito ay HITs (Hua Itelligece Tass) ad each HIT is assiged to ultiple worers so that replicated aswers are obtaied. If coflictig aswers are observed, the syste will copare the aswers of differet worers ad deterie the correct oe. For exaple, i the CrowdDB [6], the votig strategy is adopted. The replicatio strategy, however, does ot fully solve the aswer diversity proble. Suppose we wat the precisio of our iage tags to be 95% ad the cost of worer per HIT is $0.0. If we assig each HIT to too ay worers, we will have to pay a high cost. O the other had, if few worers provide tags, we will ot have eough clue to ifer the correct tags. Give a expected accuracy, we therefore eed a adaptive query egie that guaratees high accuracy with high probability ad icurs as little cost as possible.

2 I this paper, we propose a quality-sesitive aswerig odel for the crowdsourcig systes, which is desiged to sigificatly iprove the quality of query results ad effectively reduce the processig cost at the sae tie. This odel is the core of our proposed Crowdsourcig Data Aalytics Syste (CDAS). CDAS exploits the crowd itelligece to iprove the perforace of differet data aalytics jobs, such as iage taggig ad setiet aalysis. CDAS trasfors the aalytics jobs ito hua jobs ad coputer jobs, which are the processed by differet odules. The hua jobs are hadled by the crowdsourcig egie, which adopts a two-phase processig strategy. The quality-sesitive aswerig odel is correspodigly split ito two sub-odels, a predictio odel ad a verificatio odel. The sub-odels are applied to differet phases, respectively. I the first phase, the egie eploys the predictio odel to estiate how ay worers are required to achieve a specific accuracy. The odel geerates its estiatio by collectig the distributio of all worers historical perforaces. Based o the odel s result, the egie creates ad subits the HIT to the crowdsourcig platfor. I the secod phase, the egie obtais the aswers fro the hua worers ad refies the as differet worers ay retur differet results for the sae questio. To verify the aswers fro differet hua worers, the votig strategy is used i CrowdDB to select the correct oe. I the siplest case, each HIT is set to worers ( is odd). A result is assued to be correct ad accepted, if o less tha worers retur it. The votig strategy is siple, but is ot very effective i the crowdsourcig sceario. Suppose we have a set of product reviews ad wat to ow the opiio of each review. We set the score to either positive, egative or eutral. If 30% of the worers vote positive, 30% of the worers vote egative ad the reaiig worers vote eutral, the votig strategy caot decide which aswer is ore trustable. Moreover, eve if ore tha 50% of the worers vote egative, we caot accept the aswer directly soe alicious worers ay collude to produce a false aswer. To iprove the accuracy of the crowdsourcig results, CDAS adopts a probabilistic approach. First, a verificatio odel is eployed to replace the votig strategy. It relies o worers past perforaces (i.e., the worers accuracies for historical queries) ad cobies vote distributio ad worers perforaces. Ituitively, the syste is ore liely to accept the aswers provided by the worer with a good accuracy. A rado saplig approach is desiged to estiate the worers accuracies i each job. By applyig the probability-based verificatio odel, we ca sigificatly iprove the result quality. Secod, istead of waitig for all the results, the adaptive query egie provides a approxiate result with cofidece ad refies it gradually as ore aswers are retured. This techique has bee desiged based o our observatio that i AMT, worers fiish their jobs asychroously. Therefore, it is iportat to offer the optio of a approxiate aswer that is gradually iproved as ore results are available, istead of lettig the user wait for the copletio of the query. This strategy is siilar to the traditioal olie query processig i philosophy ad serves to iprove users experiece. To evaluate our odel ad the perforace of CDAS, we ipleet two practical crowdsourcig jobs, a twitter setiet aalytics (TSA) job ad a iage taggig (IT) job. I TSA job, we subit a set of ovie titles as our queries ad try to fid the opiios of Twitter users. I IT job, we use the iages of Flicr as the queries ad as the hua worers to choose the correct tags. We will show the effectiveess of our crowdsourcig egie based o the quality-sesitive aswerig odel i the experietal sectio. Figure : CDAS Architecture The reaider of the paper is orgaized as follows. I Sectio, we preset the architecture of CDAS, ad itroduce the applicatios ipleeted over CDAS. I Sectio 3, we itroduce our predictio odel for estiatig a proper uber of worers for each job. To iprove the result accuracy, a probability-based verificatio odel is proposed i Sectio 4, which ca be exteded to support olie processig. We evaluate the perforace of our odels i CDAS i Sectio 5, ad discuss soe related wor i Sectio 6. We coclude the paper i Sectio 7.. OVERVIEW I this sectio, we itroduce the architecture of our Crowdsourcig Data Aalytics Syste, CDAS, ad discuss how to ipleet applicatios o top of CDAS.. Architecture of CDAS CDAS is the syste that exploits the crowdsourcig techiques to iprove the perforace of data aalytics jobs. The core differece betwee CDAS ad the covetioal aalytics systes lies i the processig echais. CDAS eploys hua worers to assist the aalytics tass, while other systes rely solely o coputer systes to aswer the queries. Figure shows the architecture of CDAS. CDAS cosists of three ajor copoets: job aager, crowdsourcig egie ad progra executor. The job aager accepts the subitted aalytics jobs ad trasfors the ito a processig pla, which describes how the other two copoets (crowdsourcig egie ad progra executor) should collaborate for the job. I particular, the job aager partitios the job ito two parts, oe for the coputers ad oe for the hua worers. For exaple, i hua-assisted iage search, the hua worers are resposible for providig the tags for each iage, while the iage classificatio ad idex costructio are hadled by the coputer progras. I ost cases, the two parts iteract with each other durig processig. The progra executor suarizes the results of crowdsourcig egie, ad the egie ay chage its job schedule due to the requests of progra executor. The crowdsourcig egie processes hua jobs i two phases.. I the first phase, the egie geerates a query teplate for the specific type of hua jobs. The query teplate follows the forat of the crowdsourcig platfor, such as AMT, ad should be easily uderstood by hua worers. The egie the traslates each job fro the job aager ito a set of crowdsourcig tass ad publishes the ito the crowdsourcig platfor. To reduce the crowdsourcig cost, the egie eploys a predictio odel, which estiates the uber of required hua worers for a specific tas based o the distributio of worers perforace.

3 Figure 3: Query Teplate. I the secod phase, the hua worers aswers are retured to the crowdsourcig egie, which cobies the results ad reoves the abiguity. A verificatio odel is developed to select the correct aswer based o the probability estiatio. Soeties, the hua tass eed to disclose soe sesitive data to the public. We desig a privacy aager iside the egie to address the proble. The privacy aager ay adaptively chage the forats of the geerated questios for hua worers. It ay also reject soe worers for a specific tas. The perforace of crowdsourcig egie is deteried by the two odels, the predictio odel ad verificatio odel. We shall itroduce the two odels i the followig sectios ad discuss the ipleetatio of two practical applicatios, a twitter setiet aalytics (TSA) job ad a iage taggig (IT) job, to validate the perforace of our odels.. Deployig Applicatios o CDAS I this sectio, we use the TSA job as a ruig exaple to show how to deploy a applicatio o CDAS. TSA job is typically processed usig achie learig ad iforatio retrieval techiques [][]. However, as show i the experietal sectio, CDAS ca achieve a uch higher accuracy tha soe of these traditioal approaches for the TSA job. I the TSA job, the query is forally defied as follows. DEFINITION. Query i TSA The query i TSA follows the forat of (S, C, R, t, w), wheres is a set of eywords, C deotes the required accuracy, R is the doai of aswers, t is the tiestap of the query ad w is the tie widow of the query. For exaple, suppose the user wats to ow the public opiios for iphoe4s fro Oct-4-0 to Oct-3-0, the correspodig query ca be expressed as: Q({iPhoe4S, iphoe 4S}, 95%, {Best Ever, Good, Not Satisfied}, Oct-4-0, 0). The aswer to the query cosists of two parts. The first part is the percetage of each opiio ad the secod part coprises the reasos. For the above query, oe possible aswer is that ost people perceive iphoe4s is a good product thas to the features of Siri ad ios 5, while a saller but sigificat uber of people are ot satisfied with its display ad battery perforace. Table : Users Opiio o iphoe4s Opiios Percetages Reasos Best Ever 60% Siri, ios 5, Perforace Good 0% Siri, 080P Not Satisfied 30% iphoe4, Display, Battery The query defiitio of TSA is registered i the job aager, which the geerates the correspodig processig pla. The progra executor is resposible for retrievig the twitter strea ad checig whether the query eyword (S ip hoe4s i above exaple) exists i a tweet. The cadidate tweets are fed to the crowdsourcig egie, which will geerate a query teplate as show i Figure 3. Whe the crowdsourcig egie collects eough tweets i its buffer, it starts to geerate the HIT (Hua Itelligece Tas). I particular, it creates a HTML sectio (bouded by <div> ad </div>) for each tweet usig the query s teplate. For all the tweets i the buffer, we cocateate their HTML sectios to for our HIT descriptio. Therefore, oe HIT i the TSA job cotais questios for ultiple tweets about the sae product, ovie, perso or evet. The HIT is the published ito the AMT for processig. Algorith suarizes the two-phase query processig i the crowdsourcig egie (ote that Algorith describes the geeral query processig strategy, ot just for the TSA job). I the preprocessig, the egie geerates a HIT job for the tweets usig the query teplate (lie -6). I the first phase, it applies the predictio odel to estiate the uber of worers required to satisfy the predefied accuracy (lie 7). I the secod phase, it subits the HIT to AMT ad waits for the aswers (lie 8-0). The verificatio odel is used to select the correct aswers. I lie 7, Q.C deotes the accuracy requireet specified by query Q. Algorith queryprocessig ( ArrayList<T weet> buffer, Query Q ) : HtlDesc H ew HtlDesc() : for i 0to buffer.size- do 3: Tweet t buffer.get(i) 4: HtlSectio hs ew HtlSectio(Q.teplate(), t) 5: H.cocateate(hs) 6: HIT tas ewhit(h) 7: it predictworernuber(q.c) 8: subit(tas, ) 9: while ot all aswers received do 0: verifyaswer() I Algorith, the two odels direct the whole procedure of query processig, which are also the focus of this paper ad will be preseted i the followig sectios. 3. PREDICTION MODEL 3. Ecooic Model i AMT The predictio odel is desiged to esure high-quality aswers ad to reduce cost. It is highly related to how the crowdsourcig platfor charges the requesters. Therefore, we first briefly itroduce the ecooic odel of AMT. I AMT, a HIT is published ad broadcasted to all cadidate worers. Ay cadidate worer ca accept the tas. Thus, if aswers for a HIT are required, fro the poit of view of CDAS, there will be rado worers providig the aswers. AMT charges CDAS for each HIT usig the followig rules:. Every worer is paid a fixed aout of oey c.. CDAS pays a fixed aout of oey s per worer to the AMT syste for each HIT. Therefore, we sped ( c + s) for each HIT. Tae query Q (S, C, R, t, w) itsaas aexaple, ifwe get K available tweets

4 for each tie uit, the cost of processig Q is ( c + s)kw. I our predicatio odel, the uber of worers is correlated to the required accuracy C. We use fuctio g to deote the relatioship betwee C ad. Cosequetly, the query cost ca be represeted as ( c + s)wk g(c). Before we preset the techical details, we suarize the otatios used i the paper i Table. U u i P A a i μ f(u i) Ω P (r Ω) c i ρ(r i) Table : Table of Notatios the set of worers the i-th worer the uber of worers the probability of at least worers provide the correct aswer the set of accuracy of worers the accuracy of worer u i the ea value of worer accuracy the aswer provided by worer u i the observatio of distributio of aswers the probability of aswer r beig correct uder the observatio Ω the uber of all possible aswers the cofidece of worer u i the cofidece of aswer r i 3. Votig-based Predictio Give ( is odd) aswers fro worers U {u,u,..., u }, the votig strategy accepts a aswer if at least worers retur the sae aswer. While the votig strategy guaratees that o other aswers have ore votes of beig the correct aswer, it however does ot address the proble of how to select. To address the above proble, we propose a votig-based predictio odel. Give a accuracy requireet, the predictio odel estiates the uber of worers required. That is, the goal of the predictio odel is to derive the fuctio g for each query. We prove i Sectio 4 that the odel ca also produce a boud for our probability-based verificatio approach. 3.. A Coservative Estiatio We copute the probability that at least worers provide the correct aswer. We use P to deote the probability. Suppose the accuracy of all worers are A {a,a,,a }, where the accuracy eas the probability of a worer providig a correct aswer. By the defiitio of P, we have the followig equatio: P U U, U ( u i U a i ( a j)) u j U U deotes a subset of user set U with size o saller tha.the above equatio euerates all the possible cases that the correct aswer ca be obtaied by votig. The worers of a HIT ca be cosidered as rado worers fro AMT. Let μ deote the ea value of the worers accuracy. We have the followig theore to copute the expectatio of the probability that at least worers retur the correct aswer: THEOREM. If worers aswer the queries idepedetly, E[P ] ( ) μ ( μ) PROOF. As all worers are radoly piced, a i ad a j are idepedet for ay i j. Siilarly, a i ad a j are also idepedet. Thus, E[P ] E[ ( a j))] E[ U U, U ( u i U a i u j U ( a i U U, U u i U u j U ( ( U U, U u i U ( a j)))] ( E[a i] E[( a j)])) u j U We have E[a i]μ ad E[ a i] μ. Therefore, E[P ] ca be coputed as: E[P ] ( μ ( μ))) ( U U, U u i U ( U U, U u j U μ ( μ) ) ( ) μ ( μ) For a give query, we require E(P ) to be o less tha a give accuracy C, i.e., E(P ) C. Furtherore, we derive a lower boud of E(P ) that ca be easily coputed as follows. THEOREM. E[P ] e (μ ) PROOF. By Cheroff Boud, ( ) μ ( μ) e (μ ) + Moreover, for ay odd, wehave Therefore, E[P ] + ( ) μ ( μ) ( ) μ ( μ) e (μ ) + By requirig e (μ ) C, we guaratee that E[P ] C (i.e., the expected accuracy of the query result is o less tha C). Cosequetly, we obtai a sufficiet coditio for the quality of the crowdsourcig query egie: THEOREM 3. Give required accuracy C ad the ea value of worers accuracy μ, choosig l( C) (μ ) worers esures the expected accuracy of the crowdsourcig result o less tha C. Note that is a odd iteger, so the iiu value of is l( C) 4(μ ) +.

5 3.. Optiizatio with Biary Search Settig to l( C) +esures the expected accuracy of 4(μ ) results. However, it is well ow that Cheroff Boud provides a tight estiatio oly for a large eough. I soe HITs, oly a few worers participate i processig. Therefore, Theore 3 geerates a coservative estiatio that ay cause too ay worers to be ivolved. To address this proble, we use Theore 3 as a upper boud ad apply a biary search algorith (o odd ubers) to fid a tighter estiatio, i.e. the iiu odd that satisfies E[P ] C. Algorith biarysearch(double C) //C is the required accuracy : it s,ite l( C) + 4(μ ) : while s<edo 3: it s+e + 4 4: it E coputeexpectedprob() 5: if E C the 6: e 7: else 8: s + 9: retur e Algorith 3 coputeexpectedprob(it x) : double E0, δμ x : for it ix to x do 3: E E + δ 4: δ δ ( μ)i μ(x i+) 5: retur E Algorith shows the idea of biary search. We iitialize the doai of to be [, l( C) +](lie ). At each step, we 4(μ ) copute the expected accuracy of usig worers (lie 4), util we reach the iiu that satisfies the accuracy requireet. Algorith 3 illustrates the process of coputig the expected accuracy. Its correctess is based o the fact that ( ( ) / ) /( +). Obviously, the tie coplexity of Algorith 3 is O(). Therefore, we ca get a tighter boud of the uber of worers required usig Algorith i O( log ) tie. 3.3 Saplig-based Accuracy Estiatio I the previous two predictio odels, we rely o the statistics of worers accuracy distributio. However, ot all crowdsourcig platfors provide such iforatio due to the privacy issue. Eve if soe platfors provide certai statistics, they caot be directly used as worers accuracy. For exaple, AMT syste records the approval rate of each worer. Approval rate shows the percetage of aswers approved by the requester. However, we have observed that the approval rate is ot cosistet with the accuracy of the worer i CDAS. There are two ai reasos. First, the worer s accuracy ay vary widely across jobs. Secod, soe requesters set autoatic approval for all aswers without verificatio. The differece of approval rate ad accuracy is studied through experiets. To resolve the above proble, we desig a saplig-based approach. Specifically, for a registered query, we radoly ebed questios, whose groud truth are ow beforehad. These questios are used as our testig saples to estiate the worers accuracy. Here we use TSA applicatio to illustrate the saplig ethod. As etioed previously, each HIT cotais the questios of B tweets. To get ubiased results, we radoly iject αb saples Algorith 4 dosaplig(hit H) : WorerSet UH.getWorers() : Double[] rate ew Double[U.size] 3: while H.extQuestio() ull do 4: Questio q H.getNextQuestio() 5: if q is a testig saple the 6: for i 0to U.size do 7: Worer u U.get(i) 8: if u.getaswer(q)q.groudtruth the 9: rate[i] rate[i]+ αb.size ito a HIT. I other words, each HIT has αb testig saples ad ( α)b ew tweets. I our curret ipleetatio, α ad B are set to 0. ad 00, respectively. We evaluate the effect of saplig rate α i our experiets, ad the results cofir that eve a low saplig rate ca produce a acceptable estiatio. I the saplig process, CDAS collects the accuracy of participatig worers. Algorith 4 shows the procedure. After the saplig, the statistics are used i both the predictio odel ad the verificatio odel. 4. VERIFICATION MODEL I the votig-based verificatio, if ore tha half of the worers retur the sae aswer, the query egie will accept it as the correct aswer. Despite the fact that our predicatio odel tries to guaratee that at least half of the worers subit the correct aswer, the votig-based verificatio occasioally fails to provide a aswer. For a specific questio, differet worers ay provide differet aswers, ad i soe cases, o aswer gets a agreeet above 50%. Moreover, the votig strategy assues that all the worers provide the correct aswer with the sae probability, which is ot true as the accuracy of differet worers varies a lot ad the worers with higher accuracy are ore trustable. I this sectio, we propose a probability-based verificatio ethod to deterie the best aswer. 4. Probability-based Verificatio Probability-based verificatio tries to evaluate the quality of aswers through worers historical perforaces (i.e. accuracy). I particular, give the probability distributio of worers perforaces, we apply the Bayesia theore to estiate the accuracy of each result. We adopt ad exted the approach proposed i the data fusio [4] for itegratig coflictig results i the CDAS. Suppose a HIT is aswered by worers {u,u,,u } with accuracy {a,a,,a }. Wedefie fuctio f(u i) to represet the aswer provided by worer u i. Based o Bayesia aalysis, the probability of a specific aswer r R beig the correct aswer give the observatio of the aswer s distributio Ω (i.e. the aswers provided by worers) ca be coputed as: P ( r Ω) P (Ω r)p ( r) P (Ω) P (Ω r)p ( r) P (Ω ri)p (ri) r i R Suppose the size of the aswer doai R. Without a priori owledge, each aswer r i R appears with equal probability of. The the above equatio ca be trasfored ito: P ( r Ω) P (Ω r) r i R P (Ω ri) ()

6 Let r be the correct aswer. The probability for worer u j providig the correct aswer is a j (i.e. accuracy). Without ay priori owledge, each icorrect aswer provided by u j appears with equal probability a j. Therefore, P (Ω r) ca be coputed as: P (Ω r) f(u j ) r a j Cobiig Equatio ad, we have P ( r Ω) f(u j ) r f(u j ) r aj a j f(u j ) r a j r i R ( f(u j )r i a j f(u j ) r i a j f(u j ) r ( )a j a j r i R ( f(u j )r i ( )a j a j ) ) For ease of illustratio, we defie the Worer Cofidece for a aswer as follows. DEFINITION. Worer Cofidece Let a j be the accuracy of worer u j.thecofidece c j of worer u j is defied as: c j l ( )aj a j l( ) + l a j a j Fro the above defiitio, we ca see that high-accuracy worers will get large cofidece values. This is cosistet with the ituitio that worers with higher accuracy are ore trustable. Basedothedefiitio of worer cofidece ad the equatio 3, we defie the Aswer Cofidece as below. DEFINITION 3. Aswer Cofidece The cofidece of a aswer r equals to the probability of r beig the correct aswer: ρ( r) P ( r Ω) f(u e j ) r c j r i R (e f(u j )r c i j ) I our CDAS, the aswer with the highest cofidece is accepted as the fial result. I fact, the cofidece of a aswer represets a variat of votig, where e c j is used as the weight for worer u j. Apparetly, the worer with a higher cofidece gets ore weight. To speed up the coputatio of P ( r Ω), we cache the value l a j a j for each ow worer. We ca prove that usig Theore to estiate the uber of worers required also produces a qualityboud for our probabilitybased verificatio approach. THEOREM 4. If E[P ] C ad let r be the correct aswer, we have that our probability-based verificatio odel returs r as the result with a probability o less tha C. PROOF. Based o Theore, ( ) E[P ] μ ( μ) C Naely, the expected uber of worers, who provide the correct aswer, is larger tha with a probability larger tha C. The cofideces of all worers are idepedet ad idetically distributed () (3) (4) (i.i.d.), because the accuracies of the worers are i.i.d. Let E c deote the ea value of worers cofideces. As a result, the total uber of expected votes for aswer r is E[ f(u j ) r c j] f(u j )r E[c j] E c {u j f(u j)r} > Ec Note that i Equatio 4, all aswers share the sae deoiator. f(u The value of P ( r Ω) is proportioal to e j ) r cj. Thus, r is the aswer with the largest expected cofidece ad is retured as the result i expectatio. Otherwise, if aother aswer r has a larger expected probability tha r, i.e., Therefore, We will have E[ E[ E[P (r Ω)] >E[P ( r Ω)] f(u j )r c j] >E[ f(u j )r c j]+e[ f(u j ) r f(u j ) r c j] > Ec c j] >E c I fact, the su of worers cofideces is equal to the su of cofideces for every aswer: E c E[ ( c j)] r i R p f(u j )r i This results i a cotradictio that the su of cofideces of r ad r exceeds the su of all cofideces. Therefore, our probabilitybased verificatio odel returs r as the result with a probability o less tha C. The oly uow paraeter i Equatio 4 is, the size of R. We ca siply set R. However, i our experietal study, we have foud that ot all aswers i R are piced by the worers. For exaple, if a questio ass a worer to ra a product based o soe tweets ad the score rages fro 0 to 00, the scores will follow a very sewed distributio. Soe low-probability aswers are ever selected, but they do reduce the weight of a correct aswer. Thus, we eed to select a good to prue the oise. After a HIT copletes, the crowdsourcig egie gets distict aswers for a specific questio fro worers ( ). I this observatio, we select distict aswers aog possible oes. The probability of this selectio ca be coputed as ( ). Suppose this is ot a very rare observatio ad the probability of this observatio is larger tha ɛ (e.g., we prue the low-probability oise). The followig lea provides a lower boud for. LEMMA. > H ( )(ɛ) is the -th Haroic uber.,whereh i i

7 PROOF. ( ) ɛ < ( ) ( +) ( ) ( )( ) ( ) ( ) ( )( ) ( ) ( ( ( i ))) i ( H ) Derived fro the above equatio, we have Therefore (ɛ) < H > H (ɛ) H ( )(ɛ) For a large, the above lower boud is too loose. Istead, we propose a tighter lower boud for : LEMMA. > ɛ e PROOF. Fro Lea, we have Therefore, Obviously, i ɛ < By settig l THEOREM 5. l ɛ< i ( i ) i i i l i i l i i >l > l ɛ, we get a tighter boud: > ɛ e >ax{ H ( )(ɛ) PROOF. Directly fro Lea ad., ɛ e } I our verificatio, we set ɛ to 0.05 based o Fisher s exact test [5], which is widely adopted i practice. We the use Theore 5 to estiate the value of. We ow give a exaple i TSA to show the beefit of applyig our probability-based verificatio odel. Table 3 shows the Table 3: A Exaple of Worers Aswers Movie Title Gree Later Tweet Oh. My. GOD. Gree Later ovie is terrible. Lie, Lost I Space ovie terrible. Worer ID w w w 3 w 4 w 5 Accuracy Aswer pos pos eu eg pos Table 4: Results of Verificatio Models pos eu eg Aswer Half-Votig 3 pos Majority-Votig 3 pos Verificatio eg exaple. Five worers with differet accuracies provide three differet aswers, aely Positive, Neutral ad Negative. The results of the three verificatio odels are show i Table 4. Both the Half-Votig odel ad the Majority-Votig odel choose Positive as the results sice three worers out of five provide the aswer Positive. However, our verificatio odel ca correctly choose Negative as the result because the worer aswerig Negative has a uch higher accuracy. As a result, our verificatio odel gets ore accurate aswers tha the other two votig-based odels. 4. Olie Processig The worers subit their aswers asychroously i the AMT ad CDAS has to wait for sufficiet uber of aswers to be subitted. As a cosequece, query respose tie i CDAS (ad other crowdsourcig systes for that atter) is expected to be loger tha that of o-crowdsourcig systes. To alleviate such a proble ad also to iprove users experiece, we adopt olie processig techiques i CDAS. Istead of waitig for all worers to coplete their tass, CDAS provides a approxiate result based o the aswers received so far. As we have previously discussed, ucertaity ad approxiatio caot be avoided i crowdsourcig systes, which aes olie processig a perfect fit for the query processig i CDAS. To resolve the ucertaity, we exted the techiques of data fusio [4][4] to estiate the aswer s cofidece. However, the sae approach caot be directly applied to the olie processig i CDAS, as i the crowdsourcig systes, the hua worers copete for the tass ad CDAS does ot have the profile (i.e. accuracy) for a specific user util he/she returs the aswer. I our case, the accuracy of the aswer provided by a usee worer ca oly be estiated by the distributio of all worers accuracies. 4.. Fidig the Correct Aswer Olie We apply Equatio 4 to cotiuously update the probability of each received aswer. Suppose a HIT is assiged to worers ad the query egie receives aswers fro ( <) worers. Ulie Equatio 4, i this case, we oly receive a partial observatio Ω for the aswer distributio. For the reaiig worers, we have o idea about what aswers they ay provide. Let s deote a possible aswer set by the reaiig worers ad we use S to represet all the possible s. Let A {a +,a +,..., a } be the accuracies of the reaiig worers. As we do ot ow the idetities of the reaiig worers, we cosider all the possibilities. We use A to represet all the possible perutatios of A. The cofidece of a aswer r beig the correct oe ca be estiated as the expected probability P (r Ω,s) over S ad A, i.e., ρ(r) E s S,A A [P (r Ω,s)]

8 The followig theore shows that Equatio 4 ca be applied to copute ρ(r). THEOREM 6. Assue that worers process the query idepedetly ad the aswers are subitted i a rado order. ρ(r) P (r Ω ) PROOF. Based o the assuptio, we have: ρ(r) E s S,A A [P (r Ω,s)] E A A [E s S[P (r Ω,s)]] I fact, the aswer set of the reaiig worers s does ot affect the coputatio of the above equatio. As show i [4], E s S[P (r Ω,s)] P (r Ω ) The coputatio of ρ(r) ca be further siplified as: ρ(r) E A A [E s S[P (r Ω,s)]] E A A [P (r Ω )] P (r Ω ) Figure 4: Reviews for Kug Fu Pada Theore 6 shows that the cofidece of a partial result ca also be coputed by Equatio 4. Therefore, we select the aswer with axial cofidece as our correct aswer. 4.. Early Teriatio Whe the curret aswers are good eough, we ca teriate the HIT to reduce cost. The ajor challege of early teriatio is how to easure the quality of the curret results. Ituitively, we ca stop acceptig aswers fro ew worers as soo as we are sure that the curret result r will ot chage by the aswers we choose to forgo. I particular, let r ad r be the best ad secod best aswers basedotheircofidece, respectively. We have P (r Ω ) > P (r Ω ).Let(u,u,..., u ) be the set of worers. Suppose worers have subitted aswers ad aswers reai ufilled. Assue a aswer set s {f(u i)r + i }. Usig siilar techiques i paper [4], we ca prove the theore of iial possible value of P (r Ω) ad the axial possible value of P (r Ω): i P (r Ω) P (r Ω,s) (5) ax P (r Ω) P (r Ω,s) (6) Note that i P (r Ω) ad ax P (r Ω) are related to the rado variables a +,a +,,a. I our algorith, we use the expected value of i P (r Ω) ad ax P (r Ω), aely, E A A [i P (r Ω)] ad E A A [ax P (r Ω)]. However,itisdifficult to copute the expected values directly. Therefore, i practice, we use the approxiate values of E A A [i P (r Ω)] ad E A A [ax P (r Ω)]. We assue every reaiig worer has the sae accuracy E[a i] ad use it i the Equatio 5 ad 6. Epirical results show that the approxiatios wor well i practice. We propose three differet strategies as the teriatio coditio: MiMax E A A [i P (r Ω)] >E A A [ax P (r Ω)] MiExp E A A [i P (r Ω)] >P(r Ω ) ExpMax P (r Ω ) >E A A [ax P (r Ω)] MiMax guaratees that the aswer output by our syste is stable whe the teriatio coditio is achieved. However, it is too coservative. MiExp ad ExpMax ca teriate the processig uch earlier, but ay lead to low-quality results. We study the effect of the three strategies i our experiets. Algorith 5 olieprocessig(questio q) : Set aswerew Set() : Map< Aswer,float>resultewMap() 3: while ot all aswers are retured do 4: Aswer A getnextaswer(q) 5: aswer.add(a) 6: Set distictaswer getdistictaswer(aswer) 7: for i 0todistictAswer.size- do 8: Aswer A distictaswer.get(i) 9: float cofidece coputecofidece(a) 0: result.put(a, cofidece) : if cateriate(result) the : brea 3: retur result Algorith 5 outlies the olie processig strategy adopted i CDAS. The query egie cotiuously updates the cofidece of each aswer (lie 3-3) util the teriatio coditio is satisfied. We apply Equatio 4 to estiate the cofidece of each aswer (lie 9) ad apply oe of the three teriatio strategies to decide whether to stop the processig (lie ). 4.3 Result Presetatio I the olieprocessig Algorith (Algorith 5), if there is a aswer that eets the teriatio coditio, olie processig will stop ad CDAS will accept the aswer. Otherwise, if oe of the aswers is good eough, CDAS will update the cofidece of each aswer accordig to Equatio 4. We tae queries i TSA as a exaple to illustrate the result presetatio. Give a list of tweets t,t,..., t N, let fuctio h ti (r) retur the score of aswer r for tweet t i. h ti (r) is defied as follows: if r is accepted for t i h ti (r) 0 if aother aswer is accepted ρ ti (r) oe of the aswers are accepted

9 The percetage of aswer r is the coputed as N N i ht i (r). Moreover, we geerate a set of eywords as reasos for each aswer r. These eywords are the ost frequet eywords subitted by the worers who have provided the aswer r. The results are updated as ew tweets are beig streaed ito TSA. Figure 4 shows the olie processig iterface of TSA for the review results of Kug Fu Pada. It suarizes Twitter users opiios ito three categories. The tie widow of the query is set to iutes ad i the elapsed tie (4 iutes), 0 tweets are fed to TSA, aog which 70% of tweets say Kug Fu Pada is a good ovie. TSA updates the result upo ew tweets arrivig. Users ca clic a aswer to expad the view. TSA will list the correspodig tweets for the aswer. The tweets are sorted based o tiestaps fro the ewest to the oldest. The user ca also chec the progress of the curret ruig HIT. 5. PERFORMANCE EVALUATION To evaluate the effectiveess of the quality-sesitive aswerig odel i CDAS, we developed two crowdsourcig applicatios, a twitter setiet aalytics (TSA) job ad a iage taggig (IT) job. We preset the coprehesive experietal results over TSA, ad due to the space costrait, we shall oly provide the copariso with a olie iage taggig toolit for the IT applicatio. The results for the other experiets over IT exhibit siilar treds to those of TSA. By default, our approach applies the probability-based verificatio odel (deoted as Verificatio) to select the best aswer. For copariso, the Half-Votig ad Majority-Votig odels are used as two alterative verificatio approaches. Suppose ( is odd) worers are eployed for a particular tas. I the Half-Votig odel, the aswer r i is accepted oly if o less tha worers retur it as their aswers. I the Majority-Votig odel, let v(r i) deote the votes for aswer r i. The aswer r i is accepted if for ay other aswer r j, v(r i) >v(r j). 5. Applicatio : TSA We deploy TSA o AMT ad use 00 ovie titles as our queries. The selected titles are the ost recet ovies listed i IMDB (Iteret Movie Database). The query follows the forat of Q({ovie ae}, accuracy requireet, {Positive, Neural, Negative}, Oct- -0, day). Naely, the queries are processed agaist oe-day tweets. For each HIT, 30 worers are eployed to perfor the review categorizatio tas. We aually chec each of the reviews to geerate our groud truth. 5.. Crowdsourcig vs. SVM Algorith We first show the advatages of crowdsourcig techiques over coputer progras. We copare the results of TSA with LIB- SVM. To build a autoatic classificatio odel usig LIBSVM, tweet reviews about five ovies are selected as the test data, ad tweets about the rest 95 ovies are used as traiig data. After a strea of tweets passes the filters of TSA, we also sed it to LIB- SVM ad collect the correspodig results. We the copare the results agaist our groud truth. I TSA, we vary the uber of worers fro to 5. Figure 5 shows the accuracies of both systes for five ovies, each with 00 tweet reviews. I ost cases, TSA ca achieve a higher accuracy tha LIBSVM, eve if oly oe worer is eployed. This idicates that huas are uch better at atural laguage uderstadig tha achies. For such tass, if high accurate results are required, crowdsourcig is a proisig approach. cjli/libsv/ 5.. Accuracy Aalysis I TSA, we first apply Theore to estiate the uber of worers required. This is a coservative estiatio. To reduce cost, biary search is used to refie the estiatio. Figure 6 copares the coservative estiatio with the refied estiatio geerated by the biary search. We chage the user required accuracy fro 0.65 to 0.99 ad fid that the refied estiatio is less tha half of the coservative estiatio. I the reaiig experiets, we use the refied estiatio to deterie the uber of worers required for each HIT. We ext preset the accuracy for the three verificatio odels, aely Half-Votig, Majority-Votig ad our proposed Probabilitybased Verificatio odel. Figure 7 shows that whe the uber of worers icreases, we ca get a higher accuracy. Aog the three verificatio odels, our probability-based approach achieves a uch higher accuracy tha the other two. Whe 9 worers are eployed, the probability-based odel iproves the accuracy to This verifies the beefit of cosiderig worers historical perforace. We proceed to ivestigate the effectiveess of the three verificatio odels with respect to a user required accuracy. Figure 8 shows the result. Whe the requester specifies a required accuracy, TSA estiates the uber of worers eeded to achieve that accuracy. The real accuracy is coputed by coparig the worers aswers with the groud truth. The red lie i the figure deotes the user required accuracy. We observe that the probability-based verificatio odel always provides a satisfactory result while the results of the other two odels are below the required accuracy i ost cases. We ca observe that the accuracy of the Half-Votig odel is worse tha our estiatio. The reaso is as follows. First, the estiated uber of worers ties to users ea accuracy. The ea accuracy used i the predictio odel is a overall accuracy, which is collected across various questios. However, for soe difficult questios, worers accuracies could be uch lower. As a result, the uber of worers eeded i votig odels is ore tha the estiated uber. For exaple, the followig tweet about ovie The Last Airbeder expresses a positive opiio whereas ost worers classify it ito the egative category because of the word sucs. My ephew just said that Avatar: The Last Airbeder sucs... I disowig hi. The secod reaso ca be explaied based o the results of Figure 9 ad Figure 0. Figure 9 shows the percetage of tweets with o aswers i the two votig-based odels. I soe cases, the Half-Votig ad Majority-Votig odels fail to provide a result as oe of the aswers is discriiative (All aswers get o ore tha half votes or ore tha oe aswers get the sae uber of votes). Whe the uber of worers icreases, Majority-Votig ca solve the tie ore easily. However, for the Half-Votig strategy, there are still about 5% of the tweets that caot obtai aswers with ore tha half the aout of votes. I Figure 0, whe we vary the uber of tweet reviews, we observe that the percetage of o-aswer reviews is fairly stable. This pheoeo idicates that the reviews with o-discriiative aswers are alost uiforly distributed aog all reviews Olie Processig Oe advatage of our crowdsourcig egie is its ability i supportig olie processig. It ca provide a approxiate result without waitig for all the worers to fiish their jobs. Specifically,

10 Accuracy District9 SocialNetwor Thor GreeLater Rooate Movie title LIBSVM TSA worer TSA 3 worers TSA 5 worers # of worers eeded Coservative Biary User required accuracy Real accuracy Majority-Votig Half-Votig Verificatio # of worers Figure 5: Crowdsourcig vs. SVM Algorith Figure 6: Nuber of Worers Required Figure 7: Accuracy Copariso wrt. Nuber of Worers Real accuracy No aswer ratio 5% 0% 5% 0% Majority-Votig Half-Votig No aswer ratio 5% 0% 5% 0% Majority-Votig Half-Votig 0.7 Majority-Votig Half-Votig 0.65 Verificatio User required accuracy 5% 0% # of worers 5% 0% # of reviews Figure 8: Accuracy Copariso wrt. User Required Accuracy Figure 9: Percetage of No-Aswer Reviews wrt. Nuber of Worers Figure 0: Percetage of No-Aswer Reviews wrt. Nuber of Reviews TSA will geerate a iitial result as soo as the first aswer is retured. The it will gradually refie the results as ore aswers arrive util the teriatio coditio is satisfied.thisallowsusto teriate a HIT ad cap the processig cost 3. Oe iterestig observatio i our experiets is that the accuracy of the approxiate result varies sigificatly for differet aswer arrivig sequeces. Figure shows the accuracy of the sae HIT uder four differet aswer sequeces. The red lie is the user-required accuracy Sequece 4 results i a low startig accuracy because the first two worers of sequece 4 provide icorrect aswers. Therefore, i olie processig, we ust update the cofidece of the curret result dyaically based o the aswers received as early teriatio ay potetially degrade the accuracy. We evaluate the three teriatio strategies as discussed i Sectio 4... Figure shows the effect of early teriatio o the uber of worers. The red lie deotes the estiated uber of worers via our refied predictio odel. The MiMax strategy geerates the ost coservative estiatio, but it still reduces the uber of worers by 0%. The ExpMax strategy is the ost aggressive oe, which ca save ore tha 50% of worers. I Figure 3, we show the accuracies of the differet teriatio strategies. The x-axis is the accuracy requireet specified by the user ad the y-axis is the real accuracy easured agaist the groud truth. We ca see that the MiMax ad ExpMax strategies satisfy the user required accuracy (deoted as red lie) i all cases while MiExp fails to eet the requireet at a few poits. I view of the eed for reducig the uber of worers while aitaiig good accuracy, we propose to adopt the ExpMax teriatio strategy. 3 I AMT, we ca cacel a HIT whe we detect that the aswers are good eough. By doig so, we do ot eed to pay worers who have yet subitted their aswers Effect of Saplig TSA verifies the aswers usig the probability-based verificatio odel, which relies o worers historical perforace. The AMT syste records a approval rate for each worer, which iplies his accuracy i geeral. However, the worers approval rates are ot public due to privacy cocers. To collect the statistics, we publish 500 HITs requirig worers to fill i their approval rate. We also copute the worers accuracies of aswerig TSA queries. We observe the distributio of their approval rate i AMT is very differet fro that of real accuracy i TSA, as show i Figure 4. The reasos are two-fold. O oe had, there are various types of tass i AMT ad it is atural that people caot be experts i all doais. O the other had, soe requesters set autoatic approval for all worers without checig the aswers. This results i a high average approval rate i AMT. Therefore, we adopt a saplig approach to estiate worers accuracy. Give wors, we copute their accuracies A j {a j,aj,..., a j } uder a saplig rate j%. Wevarythesapligratead plot the ea accuracy μ j ad average absolute error err j i Figure 5, where μ j ad err j are defied as follows: μ j a j i, errj i i a j i a00 i As show, both ea accuracy ad average error are stable whe the saplig rate is higher tha 0%. More precisely, ea accuracy reais early costat ad average error approaches 0. We also study the effect of saplig rate o accuracy i our verificatio odel. Figure 6 plots the result. We vary the saplig rate fro 5%, 0% to 0% ad copare the result to 00%- saplig accuracy. The red lie represets the user required accuracy. We ca see that the verificatio has a better accuracy with a higher saplig rate. Whe the user required accuracy is lower

11 Real accuracy Sequece Sequece Sequece 3 Sequece # of aswers arrived # of worers eeded MiExp MiMax ExpMax User required accuracy Real accuracy MiExp MiMax 0.65 ExpMax User required accuracy Figure : Effect of Aswer Arrivig Sequece Figure : Effect of Early Teriatio o Worer Nuber Figure 3: Effect of Early Teriatio o Accuracy Percetage of worers 60% 50% 40% 30% 0% 0% Real accuracy Approval rate Mea accuracy Average error Real accuracy Rate5% Rate0% Rate5% Rate0% Rate00% 0% Saplig rate (%) User required accuracy Figure 4: Worer Accuracy vs. Approval Rate Figure 5: Effect of Saplig Rate o Worer Accuracy Figure 6: Effect of Saplig Rate o Verificatio Accuracy tha 0.75, all saplig rates are satisfactory. The result eets all of the user required accuracy oly with a saplig rate o less tha 0%. Moreover, the accuracy uder 0% saplig rate has oly a sall gap copared to that uder 00% saplig. We use 0% saplig rate i all of our verificatio experiets. 5. Applicatio : IT I this experiet, we evaluate our odel i the cotext of iage taggig applicatio. We use 00 Flicer iages as our queries. For each iage, we give a set of cadidate tags ad let 30 worers to choose the related oes. The cadidate tags iclude Flicer tags ad soe ebedded oise tags. Agai, we first show the advatages of crowdsourcig over the applicatios o dealig with iage taggig tas. We copare our result with ALIPR 4. ALIPR[3] is a autoatic iage aotatio syste which applies -D Hidde Marov odel ad clusterig techiques. The accuracy copariso result is show i Figure 7. We use 5 groups of iages. Each group cotais top 0 Flicer iages retured by a tag. The figure clearly shows the accuracy gap betwee ALIPR ad crowdsourcig approach. ALIPR achieves its best accuracy 30% o tag su ad has oly.6% accuracy o tag apple, whereas i our crowdsourcig syste, we ca reach ore tha 80% eve with oly oe worer eployed. We ext study the effectiveess of our odel. Recall that our odel first estiates the uber of worers for a specified accuracy requireet ad the applies a probability-based odel to verify the result. Figure 8 shows the accuracy achieved with respect to the user required accuracy. As before, the red lie deotes the user required accuracy. It ca be see fro the figure that our odel ca always satisfy user s requireet RELATED WORK The eergece of Web.0 systes has sigificatly icreased the applicability ad usefuless of crowdsourcig techiques. A coplex job ca be split ito ay sall tass ad assiged to differet olie worers. Aazo s AMT ad CrowdFlower 5 are popular crowdsourcig platfors. Studies show that users exhibit differet behaviors i such icro-tas arets []. A good icetive odel is required i tas desig [0]. Recetly, crowdsourcig has bee adopted i software developet. Istead of aswerig all requests with coputer algoriths, soe hua-expert tass are published o crowdsourcig platfors for hua worers to process. Typical tass iclude iage aotatio [][8], iforatio retrieval [][8] ad atural laguage processig [3][][7]. These are tass that eve state-ofthe-art techologies caot accoplish with satisfactory accuracy, but could be easily ad correctly doe by huas. Crowdsourcig techiques have also bee itroduced ito the database desig. Qur [6][5] ad CrowdDB [6] are two exaples of databases with crowdsourcig support. I these database systes, queries are partially aswered by AMT platfor. Our syste, CDAS, adopts a siilar desig. O top of the crowdsourcig database, ew query laguages, such as hquery [0], have bee proposed, which allows users to exploit the power of crowdsourcig. Other database applicatios, such as graph search [9], ca be ehaced with crowdsourcig techiques as well. Oe ai obstacle that prevets eterprise-wide deployet of crowdsourcig-based applicatios is quality cotrol. Hua worers behaviors are upredictable, ad hece, their aswers ay be arbitrarily bad. To ecourage the to provide high-quality aswers, oetary rewards are required. Muro et al. [7] showed how to desig a good icetive odel to optiize worers participatio 5

12 Accuracy Real accuracy apple bride flyig su twilight 0.9 Subject Figure 7: Crowdsourcig vs. ALIPR ALIPR worer 3 worers 5 worers User required accuracy Figure 8: Accuracy Obtaied wrt. User Required Accuracy ad cotributios. Ipeirotis et al. [9] preseted a schee to ra the qualities of worers while Ghosh et al. [7] tried to accurately idetify abusive cotet. Ulie previous efforts, i this paper, we have desiged a feasible odel that balaces oetary cost ad accuracy, ad proposed a crowdsourcig query egie with quality cotrol. Oe of the ai challeges of our query egie is how to itegrate the coflictig results of hua worers. The siilar proble has bee well studied i the data fusio systes, for exaples [4][4]. We exteded the odels proposed i [4][4] to select ad verify the crowdsourcig results i our CDAS. 7. CONCLUSION Crowdsourcig techiques allow applicatio developers to haress the atural expertise of hua worers to perfor coplex tass that are very challegig for coputers. However, as huas are proe to errors, there is o guaratee for the results of crowdsourcig. I this paper, we itroduced the quality-sesitive aswerig odel i our Crowdsourcig Data Aalytics Syste, CDAS. The odel guides the query egie to geerate proper query plas based o the accuracy requireet. It cosists of two sub-odels, the predictio odel ad the verificatio odel. The predictio odel estiates the uber of worers required for a specific tas while the verificatio odel selects the best aswer fro all retured oes. To iprove users experiece, whe verifyig the results, our odel ebraces olie processig techiques to update aswers gradually. By adoptig the odels, CDAS ca provide high-quality results for differet crowdsourcig jobs. I this paper, we have ipleeted a twitter setiet aalytics job ad a iage taggig job o CDAS. We used real Twitter data ad Flicr data as our queries. Aazo Mechaical Tur was eployed as our crowdsourcig platfor. The results show that our proposed odel ca provide high-quality aswers while eepig the total cost low. 8. ACKNOWLEDGEMENT The wor of this paper was i part supported by Sigapore MDA grat R REFERENCES [] O. Aloso, D. E. Rose, ad B. Stewart. Crowdsourcig for relevace evaluatio. I SIGIR Foru, 4():9 5, 008. [] J. Bolle, A. Pepe, ad H. Mao. Modelig public ood ad eotio: Twitter setiet ad socio-ecooic pheoea. I CoRR, abs/09.583, 009. [3] C. Calliso-Burch ad M. Dredze. Creatig speech ad laguage data with aazo s echaical tur. I NAACL HLT Worshop, pages, 00. [4] X. L. Dog, L. Berti-Equille, ad D. Srivastava. Itegratig coflictig data: The role of source depedece. I PVLDB, ():550 56, 009. [5] R. Fisher. Statistical ethods for research worers. Oliver ad Boyd, 954. [6] M. J. Frali, D. Kossa, T. Krasa, S. Raesh, ad R. Xi. Crowddb: aswerig queries with crowdsourcig. I SIGMOD, pages 6 7, 0. [7] A. Ghosh, S. Kale, ad P. McAfee. Who oderates the oderators?: crowdsourcig abuse detectio i user-geerated cotet. I EC, pages 67 76, 0. [8] C. Grady ad M. Lease. Crowdsourcig docuet relevace assesset with echaical tur. I NAACL HLT Worshop, pages 7 79, 00. [9] P. G. Ipeirotis, F. Provost, ad J. Wag. Quality aageet o aazo echaical tur. I SIGKDD Worshop, pages 64 67, 00. [0] G. Kazai, J. Kaps, M. Koole, ad N. Milic-Fraylig. Crowdsourcig for boo search evaluatio: ipact of hit desig o coparative syste raig. I SIGIR, pages 05 4, 0. [] A. Kittur, E. H. Chi, ad B. Suh. Crowdsourcig user studies with echaical tur. I SIGCHI, pages , 008. [] J. Ledlie, B. Odero, E. Miov, I. Kiss, ad J. Polifroi. Crowd traslator: o buildig localized speech recogizers through icropayets. I SIGOPS Oper. Syst. Rev., 43(4):84 89, 00. [3] J. Li ad J. Z. Wag. Real-tie coputerized aotatio of pictures. I IEEE Tras. Patter Aal. Mach. Itell., 30(6):985 00, Jue 008. [4] X. Liu, X. L. Dog, B. C. Ooi, ad D. Srivastava. Olie data fusio. I PVLDB, 4():93 943, 0. [5] A. Marcus, E. Wu, D. R. Karger, S. Madde, ad R. C. Miller. Deostratio of qur: a query processor for huaoperators. I SIGMOD, pages 35 38, 0. [6] A. Marcus, E. Wu, S. Madde, ad R. C. Miller. Crowdsourced databases: Query processig with people. I CIDR, pages 4, 0. [7] R. Muro, S. Bethard, V. Kupera, V. T. Lai, R. Melic, C. Potts, T. Schoebele, ad H. Tily. Crowdsourcig ad laguage studies: the ew geeratio of liguistic data. I NAACL HLT Worshop, pages 30, 00. [8] S. Nowa ad S. Rüger. How reliable are aotatios via crowdsourcig: a study about iter-aotator agreeet for ulti-label iage aotatio. I MIR, pages , 00. [9] A. Paraeswara, A. D. Sara, H. Garcia-Molia, N. Polyzotis, ad J. Wido. Hua-assisted graph search: it s oay to as questios. I PVLDB, 4(5):67 78, 0. [0] A. G. Paraeswara ad N. Polyzotis. Aswerig queries usig huas, algoriths ad databases. I CIDR, pages 60 66, 0. [] C. Rashtchia, P. Youg, M. Hodosh, ad J. Hoceaier. Collectig iage aotatios usig aazo s echaical tur. I NAACL HLT Worshop, pages 39 47, 00. [] R. V. Wazeele, K. Verbeec, A. Vorsteras, T. Tourwe, ad E. Tsiporova. Extractig eotios out of twitters icroblogs. I BNAIC, pages 304 3, 0. [3] T. Ya, V. Kuar, ad D. Gaesa. Crowdsearch: exploitig crowds for accurate real-tie iage search o obile phoes. I MobiSys, pages 77 90, 00.

CHAPTER 4: NET PRESENT VALUE

CHAPTER 4: NET PRESENT VALUE EMBA 807 Corporate Fiace Dr. Rodey Boehe CHAPTER 4: NET PRESENT VALUE (Assiged probles are, 2, 7, 8,, 6, 23, 25, 28, 29, 3, 33, 36, 4, 42, 46, 50, ad 52) The title of this chapter ay be Net Preset Value,

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Ant Colony Algorithm Based Scheduling for Handling Software Project Delay

Ant Colony Algorithm Based Scheduling for Handling Software Project Delay At Coloy Algorith Based Schedulig for Hadlig Software Project Delay Wei Zhag 1,2, Yu Yag 3, Juchao Xiao 4, Xiao Liu 5, Muhaad Ali Babar 6 1 School of Coputer Sciece ad Techology, Ahui Uiversity, Hefei,

More information

The Binomial Multi- Section Transformer

The Binomial Multi- Section Transformer 4/15/21 The Bioial Multisectio Matchig Trasforer.doc 1/17 The Bioial Multi- Sectio Trasforer Recall that a ulti-sectio atchig etwork ca be described usig the theory of sall reflectios as: where: Γ ( ω

More information

ECONOMICS. Calculating loan interest no. 3.758

ECONOMICS. Calculating loan interest no. 3.758 F A M & A N H S E E S EONOMS alculatig loa iterest o. 3.758 y Nora L. Dalsted ad Paul H. Gutierrez Quick Facts... The aual percetage rate provides a coo basis to copare iterest charges associated with

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

An Electronic Tool for Measuring Learning and Teaching Performance of an Engineering Class

An Electronic Tool for Measuring Learning and Teaching Performance of an Engineering Class A Electroic Tool for Measurig Learig ad Teachig Perforace of a Egieerig Class T.H. Nguye, Ph.D., P.E. Abstract Creatig a egieerig course to eet the predefied learig objectives requires a appropriate ad

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

GSR: A Global Stripe-based Redistribution Approach to Accelerate RAID-5 Scaling

GSR: A Global Stripe-based Redistribution Approach to Accelerate RAID-5 Scaling : A Global -based Redistributio Approach to Accelerate RAID-5 Scalig Chetao Wu ad Xubi He Departet of Electrical & Coputer Egieerig Virgiia Coowealth Uiversity {wuc4,xhe2}@vcu.edu Abstract Uder the severe

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

arxiv:0903.5136v2 [math.pr] 13 Oct 2009

arxiv:0903.5136v2 [math.pr] 13 Oct 2009 First passage percolatio o rado graphs with fiite ea degrees Shakar Bhaidi Reco va der Hofstad Gerard Hooghiestra October 3, 2009 arxiv:0903.536v2 [ath.pr 3 Oct 2009 Abstract We study first passage percolatio

More information

A Cyclical Nurse Schedule Using Goal Programming

A Cyclical Nurse Schedule Using Goal Programming ITB J. Sci., Vol. 43 A, No. 3, 2011, 151-164 151 A Cyclical Nurse Schedule Usig Goal Prograig Ruzzaiah Jeal 1,*, Wa Rosaira Isail 2, Liog Choog Yeu 3 & Ahed Oughalie 4 1 School of Iforatio Techology, Faculty

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Distributed Storage Allocations for Optimal Delay

Distributed Storage Allocations for Optimal Delay Distributed Storage Allocatios for Optial Delay Derek Leog Departet of Electrical Egieerig Califoria Istitute of echology Pasadea, Califoria 925, USA derekleog@caltechedu Alexadros G Diakis Departet of

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

GOAL PROGRAMMING BASED MASTER PLAN FOR CYCLICAL NURSE SCHEDULING

GOAL PROGRAMMING BASED MASTER PLAN FOR CYCLICAL NURSE SCHEDULING Joural of Theoretical ad Applied Iforatio Techology 5 th Deceber 202. Vol. 46 No. 2005-202 JATIT & LLS. All rights reserved. ISSN: 992-8645 www.jatit.org E-ISSN: 87-395 GOAL PROGRAMMING BASED MASTER PLAN

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Using a Packet Sniffer to Analyze the Efficiency and Power of Encryption Techniques Used to Protect Data Over a Computer Network

Using a Packet Sniffer to Analyze the Efficiency and Power of Encryption Techniques Used to Protect Data Over a Computer Network sig a Packet Siffer to Aalyze the Efficiecy ad Power of Ecryptio Techiques sed to Protect Data Over a Coputer Network Seyo Litviov Statistics Departet/MCS Progra St. Cloud State iversity slitviov@stcloudstate.edu

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Infinite Sequences and Series

Infinite Sequences and Series CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Digital Interactive Kanban Advertisement System Using Face Recognition Methodology

Digital Interactive Kanban Advertisement System Using Face Recognition Methodology Coputatioal Water, Eergy, ad Eviroetal Egieerig, 2013, 2, 26-30 doi:10.4236/cweee.2013.23b005 Published Olie July 2013 (http://www.scirp.org/joural/cweee) Digital Iteractive Kaba Advertiseet Syste Usig

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Designing Incentives for Online Question and Answer Forums

Designing Incentives for Online Question and Answer Forums Desigig Icetives for Olie Questio ad Aswer Forums Shaili Jai School of Egieerig ad Applied Scieces Harvard Uiversity Cambridge, MA 0238 USA shailij@eecs.harvard.edu Yilig Che School of Egieerig ad Applied

More information

The Computational Rise and Fall of Fairness

The Computational Rise and Fall of Fairness Proceedigs of the Twety-Eighth AAAI Coferece o Artificial Itelligece The Coputatioal Rise ad Fall of Fairess Joh P Dickerso Caregie Mello Uiversity dickerso@cscuedu Joatha Golda Caregie Mello Uiversity

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Article Writing & Marketing: The Best of Both Worlds!

Article Writing & Marketing: The Best of Both Worlds! 2612 JOURNAL OF SOFTWARE, VOL 8, NO 1, OCTOBER 213 C-Cell: A Efficiet ad Scalable Network Structure for Data Ceters Hui Cai Logistical Egieerig Uiversity of PLA, Chogqig, Chia Eail: caihui_cool@126co ShegLi

More information

Impacts of the Collocation Window on the Accuracy of Altimeter/Buoy Wind Speed Comparison A Simulation Study. Ge Chen 1,2

Impacts of the Collocation Window on the Accuracy of Altimeter/Buoy Wind Speed Comparison A Simulation Study. Ge Chen 1,2 Ge Che Ipacts of the Collocatio Widow o the ccuracy of ltieter/uoy Wid Speed Copariso Siulatio Study Ge Che, Ocea Reote Sesig Istitute, Ocea Uiversity of Qigdao 5 Yusha Road, Qigdao 66003, Chia E-ail:

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Optimizing Result Prefetching in Web Search Engines. with Segmented Indices. Extended Abstract. Department of Computer Science.

Optimizing Result Prefetching in Web Search Engines. with Segmented Indices. Extended Abstract. Department of Computer Science. Optiizig Result Prefetchig i Web Search Egies with Segeted Idices Exteded Abstract Roy Lepel Shloo Mora Departet of Coputer Sciece The Techio, Haifa 32000, Israel eail: frlepel,orag@cs.techio.ac.il Abstract

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

How To Calculate Stretch Factor Of Outig I Wireless Network

How To Calculate Stretch Factor Of Outig I Wireless Network Stretch Factor of urveball outig i Wireless Network: ost of Load Balacig Fa Li Yu Wag The Uiversity of North arolia at harlotte, USA Eail: {fli, yu.wag}@ucc.edu Abstract outig i wireless etworks has bee

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Mathematical goals. Starting points. Materials required. Time needed

Mathematical goals. Starting points. Materials required. Time needed Level A1 of challege: C A1 Mathematical goals Startig poits Materials required Time eeded Iterpretig algebraic expressios To help learers to: traslate betwee words, symbols, tables, ad area represetatios

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

INTEGRATED TRANSFORMER FLEET MANAGEMENT (ITFM) SYSTEM

INTEGRATED TRANSFORMER FLEET MANAGEMENT (ITFM) SYSTEM INTEGRATED TRANSFORMER FLEET MANAGEMENT (ITFM SYSTEM Audrius ILGEVICIUS Maschiefabrik Reihause GbH, a.ilgevicius@reihause.co Alexei BABIZKI Maschiefabrik Reihause GbH a.babizki@reihause.co ABSTRACT The

More information

Controller Area Network (CAN) Schedulability Analysis with FIFO queues

Controller Area Network (CAN) Schedulability Analysis with FIFO queues Cotroller Area Network (CAN) Schedulability Aalysis with FIFO queues Robert I. Davis Real-Tie Systes Research Group, Departet of Coputer Sciece, Uiversity of York, YO10 5DD, York, UK rob.davis@cs.york.ac.uk

More information

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Throughput and Delay Analysis of Hybrid Wireless Networks with Multi-Hop Uplinks

Throughput and Delay Analysis of Hybrid Wireless Networks with Multi-Hop Uplinks This paper was preseted as part of the ai techical progra at IEEE INFOCOM 0 Throughput ad Delay Aalysis of Hybrid Wireless Networks with Multi-Hop Upliks Devu Maikata Shila, Yu Cheg ad Tricha Ajali Dept.

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Math C067 Sampling Distributions

Math C067 Sampling Distributions Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives Outsourcig ad Globalizatio i Software Developmet Jacques Crocker UW CSE Alumi 2003 jc@cs.washigto.edu Ageda Itroductio The Outsourcig Pheomeo Leadig Offshore Projects Maagig Customers Offshore Developmet

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

How to read A Mutual Fund shareholder report

How to read A Mutual Fund shareholder report Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value Cocept 9: Preset Value Is the value of a dollar received today the same as received a year from today? A dollar today is worth more tha a dollar tomorrow because of iflatio, opportuity cost, ad risk Brigig

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Estimating Probability Distributions by Observing Betting Practices

Estimating Probability Distributions by Observing Betting Practices 5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,

More information

Professional Networking

Professional Networking Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market

More information

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the. Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).

More information

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets BENEIT-CST ANALYSIS iacial ad Ecoomic Appraisal usig Spreadsheets Ch. 2: Ivestmet Appraisal - Priciples Harry Campbell & Richard Brow School of Ecoomics The Uiversity of Queeslad Review of basic cocepts

More information

Baan Service Master Data Management

Baan Service Master Data Management Baa Service Master Data Maagemet Module Procedure UP069A US Documetiformatio Documet Documet code : UP069A US Documet group : User Documetatio Documet title : Master Data Maagemet Applicatio/Package :

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

AP Calculus AB 2006 Scoring Guidelines Form B

AP Calculus AB 2006 Scoring Guidelines Form B AP Calculus AB 6 Scorig Guidelies Form B The College Board: Coectig Studets to College Success The College Board is a ot-for-profit membership associatio whose missio is to coect studets to college success

More information

SOLAR POWER PROFILE PREDICTION FOR LOW EARTH ORBIT SATELLITES

SOLAR POWER PROFILE PREDICTION FOR LOW EARTH ORBIT SATELLITES Jural Mekaikal Jue 2009, No. 28, 1-15 SOLAR POWER PROFILE PREDICTION FOR LOW EARTH ORBIT SATELLITES Chow Ki Paw, Reugath Varatharajoo* Departet of Aerospace Egieerig Uiversiti Putra Malaysia 43400 Serdag,

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Data Analysis and Statistical Behaviors of Stock Market Fluctuations 44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:

More information

A zero one programming model for RNA structures with arc length 4

A zero one programming model for RNA structures with arc length 4 Iraia Joural of Matheatical Cheistry, Vol. 3, No.2, Septeber 22, pp. 85 93 IJMC A zero oe prograig odel for RNA structures with arc legth 4 G. H. SHIRDEL AND N. KAHKESHANI Departet of Matheatics, Faculty

More information

Spot Market Competition in the UK Electricity Industry

Spot Market Competition in the UK Electricity Industry Spot Market Copetitio i the UK Electricity Idustry Nils-Herik M. vo der Fehr Uiversity of Oslo David Harbord Market Aalysis Ltd 2 February 992 Abstract With particular referece to the structure of the

More information

Now here is the important step

Now here is the important step LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"

More information

GIS and analytic hierarchy process for land evaluation

GIS and analytic hierarchy process for land evaluation GIS ad aalytic hierarchy process for lad evaluatio Dr. Le Cah DINH Sub-Natioal Istitute of Agricultural Plaig ad Proectio Vieta lecahdih@gail.co Assoc. Prof. Dr. Tra Trog DUC Vieta Natioal Uiversity -

More information