A Conditional Model of Deduplication for Multi-Type Relational Data

Size: px
Start display at page:

Download "A Conditional Model of Deduplication for Multi-Type Relational Data"

Transcription

1 A Conditionl Model of Dedupliction for Multi-Type Reltionl Dt Aron Culott, Andrew McCllum Deprtment of Computer Science University of Msschusetts Amherst, MA {culott, Astrct Record dedupliction is the tsk of merging dtse records tht refer to the sme underlying entity. In reltionl dtses, ccurte dedupliction for records of one type is often dependent on the merge decisions mde for records of other types. Wheres nerly ll previous pproches hve merged records of different types independently, this work models these inter-dependencies explicitly to collectively deduplicte records of multiple types. We construct conditionl rndom field model of dedupliction tht cptures these reltionl dependencies, nd then employ novel reltionl prtitioning lgorithm to jointly deduplicte records. We evlute the system on two cittion mtching dtsets, for which we deduplicte oth ppers nd venues. We show tht y collectively deduplicting pper nd venue records, we otin up to 30% error reduction in venue dedupliction, nd up to 20% error reduction in pper dedupliction over competing methods. 1 Introduction A common prerequisite for knowledge discovery is ccurtely comining dt from multiple, heterogeneous sources into unified, minele dtse. An importnt step in creting such dtse is record dedupliction: consolidting multiple records tht refer to the sme strct entity. The difficulty in this tsk rises oth from dt errors (e.g. misspellings nd missing fields) nd from vrints in field vlues (e.g. revitions). Most historicl pproches hve frmed the dedupliction prolem s set of independent decisions. For ech pir of records, similrity score is clculted, nd the records re merged if the similrity is ove some threshold [8]. The decisions re comined y tking the trnsitive closure of the resulting djcency mtrix. More recently, McCllum nd Wellner [13] nd Prg nd Domingos [20] hve demonstrted tht mking multiple dedupliction decisions collectively cn provide etter results thn historicl pproches. These models re types of conditionl rndom fields (CRFs) [9], where the oserved nodes re mentions, nd the predicted nodes re the dedupliction decisions for ech pir of nodes. By frming inference s n instnce of grph prtitioning, the models re collective in the sense tht mentions re clustered sed not only on their distnce to ech other, ut lso on their distnce from ll other prtitions. By treting 1

2 dedupliction decisions in dependent reltion to ech other, inconsistencies nd noise in the similrity metric my e overcome. This pper presents model for collective dedupliction, extended to the importnt nd uiquitous cse of reltionl dtses, where records hve types, nd where there exist reltions etween records of different types. These reltions provide useful evidence for dedupliction decisions ecuse the identity of record often depends on the identities of relted records. For exmple, consider dtse of reserch ppers, where records cn e of type pper, venue, or uthor. If two pper records re leled s duplictes, then it follows tht the venue records corresponding to those ppers should lso e leled s duplictes. The reverse is more sutly true: if two venues re duplictes, then this my slightly increse the proility tht their corresponding ppers re duplictes. We propose model tht leverges these sutle interdependencies to mke dedupliction decisions collectively cross multiple record types. In prticulr, we present CRF for the cittion domin tht provides conditionl proilistic model of dedupliction decisions over records of multiple types given oserved record mentions nd the reltions mong them. We propose novel, reltionl grph prtitioning lgorithm for inference tht not only ensures tht dedupliction decisions mde for different record types re consistent, ut lso llows the decisions from one record type to inform the decisions for nother record type. Prmeter estimtion consists of mximizing the product of locl mrginls for pirs of records of different types. Tht is, we prmeterize the CRF to lern weights over 4-tuples consisting of record pir nd relted record pir of different type. In the cittion domin, these 4-tuples consist of pir of pper records, nd pir of relted venue records. In this wy, the model lerns prmeters to trde-off pper nd venue dedupliction decisions. We provide results on dtse of reserch ppers, where we show tht modeling dedupliction of pper nd venue records collectively improves dedupliction performnce for ech type, providing up to 30% error reduction in venue dedupliction, nd up to 20% error reduction in pper dedupliction over previously proposed collective model [13] tht does not model the dependencies etween record types. 2 Relted Work To the est of our knowledge, this is the first pper to present discrimintive, collective model of dedupliction for multiple, relted record types nd demonstrte empiriclly the performnce gins ttinle over independent models. We riefly review clssicl work in dedupliction, then discuss recent efforts in collective dedupliction. Record dedupliction, known vriously s record linkge, coreference resolution, dedupliction, nd identity uncertinty, is prevlent in mny fields, including computer vision, dtses, nd nturl lnguge processing. 2

3 Originlly introduced in the dtse community s record linkge [17], record dedupliction ws lter formlized y Fellegi nd Sunter [8] s the computtion over fetures etween pirs of records, nd further extended y Winkler [25, 26]. This previous work clcultes similrity score for record pirs, collpses those ove similrity threshold, then performs trnsitive closure. It is not reltionl in the sense tht one dedupliction decision does not directly ffect nother. More recent record linkge work hs considered the dedupliction of ctegoricl dt, llowing ttriutes to e deduplicted long with records [1]; however, tht work does not utilize mchine lerning nd requires thresholds to e set mnully. Methods of lerning etter similrity score hve een investigted recently in the dtse community [5, 7]. Similr trends exist in nturl lnguge processing for the tsk of coreference resolution, where reserch hs focused on lerning more useful similrity metrics nd pplying them to thresholding techniques nlogous to those found in the dtse community [18, 16]. Only recently hve collective dedupliction models een investigted. Milch et. l hve introduced genertive models to reson in worlds with n unknown numer of ojects, enling proility distriutions to e defined over reltionl dt with mny oject types [11, 15]. While these models hve ppeling forml semntics, their genertive nture forces the model to mke conditionl independence ssumptions mong fetures. Our work cn e viewed s extensions to recent models pplying conditionl rndom fields to the dedupliction tsk [13, 24, 20]. McCllum nd Wellner [13] hve presented CRFs which perform collective coreference y equting inference in the CRF s grph prtitioning prolem, resulting in collective coreference of records of one type. This previous work demonstrted the dvntges collective coreference hs over clssicl pproches; however, it does not model multiple types of coreferent ojects. Prg nd Domingos [20] present CRF model similr to tht in McCllum nd Wellner [13] in tht it collectively deduplictes records of the sme type using grph prtitioning. Additionlly, this method llows informtion to propgte etween records y wy of their shred ttriutes. However, the Prg nd Domingos model does not tret ttriutes s first-clss ojects. In prticulr, their model collpses string identicl ttriute nodes nd cretes informtion nodes to model whether or not ttriutes mtch. The model does not explicitly optimize dedupliction decisions for ttriutes; rther, the informtion nodes cn e viewed s n input vriles to record dedupliction. An importnt distinction with our work is tht in the Prg nd Domingos model, joint dedupliction only occurs mong records shring n identicl ttriute. This is often not the cse in rel dt. In sense, the Prg nd Domingos model cn e viewed s discrimintive version of recently proposed hierrchicl model for dedupliction y Rvikumr nd Cohen [22], which introduces ltent mtch nodes for ttriutes. Here gin, determining whether ttriute vlues re coreferent is viewed s locl decision used s input to the record dedupliction decision. Our model insted 3

4 PID Author Title Venue VID 0 X. Li Predicting the stock mrket CIKM 10 1 X. Li Predicting the stock mrket Conf on Informtion Mngement 20 2 J. Smith Semi-Definite Progrmming CIKM 30 3 Smith, J. Semi-Definte Progrming Conference on Info Mngement 40 Tle 1: An exmple of four ppers with the sme venues in pulictions dtse. PID is the pper id, nd VID is the venue id. trets ttriutes s records themselves, performing full dedupliction on them s well s their relted records. Other recent work in knowledge discovery hs leverged reltionl informtion to perform coreference [3, 4]. These models define similrity metric etween records tht considers the identity of relted records. This is similr in spirit to our model, since it uses dedupliction decisions of relted records to clculte the similrity etween records. However, their model is minly concerned with deduplicting uthors lone, nd does not explicitly model dedupliction of multiple record types. Also, the trining methods descried in those models do not cpture the rich set of fetures ville to the model presented in this pper. Our model cn lso e descried s type of reltionl Mrkov network (RMN) [23], which hve een employed successfully in reltionl domins, lthough not multi-type dedupliction. Another importnt distinction is tht our trining nd inference methods differ sustntilly from the loopy elief propgtion lgorithm commonly used in RMNs. 3 Motivting Exmple We first provide n exmple to motivte the potentil enefits of collective dedupliction for dtses with multiple record types. Consider gin dtse of reserch ppers, with uthor, pper, nd venue records. Our tsk is to deduplicte the vrious mentions of these records into unique entities. Tle 1 shows dtse of four ppers nd four venues, ech with unique ids. Ppers 0 nd 1 nd ppers 2 nd 3 should e merged; ll the venues should e merged. Imgine n gglomertive dedupliction system which egins y ssuming ech record is unique. Suppose the system first considers merging ppers 0 nd 1. Although the venues do not mtch, ll the other fields re exct mtches, so it is fesile tht the system my overcome this discrepncy. After merging ppers 0 nd 1, the system lso merges the corresponding venues 10 nd 20 into the sme cluster, since the venues of duplicte ppers must themselves e duplictes. Imgine the system next merges venues 10 nd 30 ecuse they re string identicl. The system must now decide if ppers 2 nd 3 re duplictes. Treted in isoltion, system my hve hrd time correctly detecting tht 2 nd 3 re duplictes: the uthors re highly similr, ut the title contins two 4

5 r ii x i y ij y jk x k x i y ij y jk x k x i y ij y jk x k x i y ij y jk x k x i r jj y ij y jk x k r kk x i y ij y jk x k () () (c) r jj Figure 1: Three incresingly complex models of pper nd venue dedupliction. X nd X re pper nd venue records, respectively. Y is inry rndom vrile indicting duplicte records, nd R ij denotes whether pper X i ws pulished in venue Xj. misspellings, nd the venues re extremely dissimilr. However, the system we hve descried so fr is fortunte to hve more informtion t its disposl. It hs lredy merged venues 10 nd 20, which re highly similr to venues 30 nd 40. By consulting its dtse of deduplicted venues, it could determine tht 30 nd 40 re in fct the sme venue. With this informtion in hnd, it my e more forgiving of the spelling mistkes in the title, finlly merging ppers 2 nd 3 correctly. This exmple illustrtes the notion tht the identity of n oject is dependent on the identity of relted ojects. Notice tht y using reltionl informtion, the system not only merged ppers it might not hve otherwise (2 nd 3), ut lso merged venues it might not hve (10, 20 nd 30, 40). Indeed, the chin of dedupliction decisions which led to optiml performnce interleved pper nd venue decisions: (0,1), (10,20), (30,40), (2,3). The work presented here descries system tht models the dedupliction decisions of relted records collectively, enling the sort of proilistic trdeoffs instrumentl to the success of the system in this exmple. 4 Model The model is n instnce of conditionl rndom field tht jointly models the conditionl proility of multiple dedupliction decisions given n oserved reltionl dtse. We egin with rief review of conditionl rndom fields, followed y forml description of the model. We then descrie the pproximtions used to mke inference nd prmeter estimtion trctle for this model. 5

6 4.1 Conditionl Rndom Fields Conditionl rndom fields (CRFs) [9] re undirected grphicl models encoding the conditionl proility of set of output vriles Y given set of evidence vriles X. The set of distriutions expressile y CRF is specified y n undirected grph G, where ech vertex corresponds to rndom vrile. If C = {{y c, x c }} is the set of cliques in G, then the conditionl proility of y given x is p Λ (y x) = 1 φ c (y c, x c ; Λ) Z x c C where φ is potentil function prmeterized y Λ nd Z x = y c C φ(y c, x c ) is normliztion fctor. We ssume φ c fctorizes s log-liner comintion of ritrry fetures computed over clique c, therefore ( ) φ c (y c, x c ; Λ) = exp λ k f k (y c, x c ) The model prmeters Λ = {λ k } re set of rel-vlued weights typiclly lerned from leled trining dt y mximum likelihood estimtion. 4.2 CRFs for multi-type dedupliction Let X e collection of rndom vriles representing oserved record mentions in dtse tht requires dedupliction. For clrity, ssume there re only two types of records, X = (X, X ), where X = (X1,..., Xn), X = (X1,..., Xm). The gol of dedupliction is to prtition X into clusters of records tht ll refer to the sme strct entity. To this end, we define collection of inry rndom vriles Y = (Y, Y ) tht indicte whether or not two records re duplictes. For exmple, Yij indictes whether or not records Xi nd Xj re coreferent. We lso define the inry rndom vriles R, where Rij indictes whether some ritrry reltion R holds etween record mentions Xi nd Xj. For exmple, in reserch pper dtse, X represents the set of pper records, X represents the set venue records, Yij indictes whether X i nd Xj re duplictes, nd Rij indictes whether pper X i ws pulished t venue X j. In the generl cse where R is unoserved, one could construct the conditionl distriution P (Y, Y, R X). With this model, one cn infer from n oserved set of records the most prole set of duplicte records nd the most prole set of reltions etween records. For exmple, in the pulictions domin, one my wnt to model the dvisor of reltion etween uthors, while lso modeling uthor dedupliction. We postpone this investigtion for future work, nd insted focus on the cse where R is oserved. For instnce, in cittion dt, we know which venues records re relted to which pper records. Thus, we desire to model the conditionl distriution P (Y, Y X, R). k 6

7 Figure 1 displys three incresingly complex grphicl models of this conditionl distriution. Vertices re rndom vriles, edges indicte possile proilistic dependence etween vriles, nd shded vertices indicte oserved vriles. Model () corresponds to the clssicl pproch, which trets ech dupliction decision independently. Model () is the pproch evluted in McCllum nd Wellner [13], extended here to the cse of multiple record types. Note tht in this model dedupliction decisions for records of the sme type re mde collectively. Model (c) is the one this pper dvoctes. Not only re dedupliction decisions for records of the sme type mde collectively, ut lso the decisions for one type of record re dependent on decisions mde for relted records. For ese of presenttion, we hve only included the oserved reltion vriles R which re true. We now provide more precise description of model (c). Let x ij = x i,, x i, x j e pir of oserved pper record mentions nd their corresponding venue records. To cpture the dependence etween yij nd yij, we fctorize the potentil functions to consider them jointly, resulting in the model: p(y, y x, r) = 1 Z x exp ( i,j,l λ l f l (x ij, y ij, y ij, r ij ) + ) λ f (yij, yjk, yik, yij, yjk, yik) i,j,k where the fetures f re consistency checking functions used to enforce trnsitivity mong dedupliction decisions. For exmple, if ppers x i nd x j re coreferent, nd x j nd x k re coreferent, then not only must ppers x i nd x k e coreferent, ut venues x i, x j, x k must lso e coreferent. (Note f is of nottionl use only in prctice, the inference lgorithm simply voids these impossile configurtions.) Becuse oth yij nd y ij re rguments to the feture functions f l, these potentils cpture the cross-product of pper nd venue dedupliction decisions. This llows the lerned weights to encourge merging pper records which hve equivlent venue records, nd to discourge merging ppers with different venues. The cost of modeling these interdependencies is highly connected grphicl model, which necessittes pproximtions in oth inference nd prmeter estimtion. We descrie these pproximtions elow. 4.3 Inference Inference in this model corresponds to finding the solution to y = (y, y ) = rgmx p Λ (y, y x, x, r) y 7

8 tht is, finding the most prole dedupliction decisions y given x, x, r nd the lerned prmeters Λ. Exct inference in this model is intrctle ecuse the spce of possile y is exponentil in the numer of records x, nd the the high connectivity of the grph precludes fesile dynmic progrm to mke this serch trctle. One common pproximte inference technique for such predicment is to perform loopy elief propgtion; tht is, perform stndrd elief propgtion [21], ignoring the messge doule-counting cused y the cycles in the grph. However, the severe cyclicity of this model my require prohiitive mount of time for elief propgtion to converge, if it converges t ll. Insted, we follow recent work which finds n equivlence etween grph prtitioning lgorithms nd inference in certin undirected grphicl models [6]. We first trnsform our grph to weighted, undirected grph tht only contins vertices for vriles x nd hs edges weighted y the (log) clique potentil for ech pir of vertices. The vlue on these edges depends on which type of records they join. For pper edges, we define the weight w ij = y ij {0,1} ( l l λ l f l (x ij, y ij = 1, y ij, r ij ) ) λ l f l (x ij, yij = 0, yij, rij ) nd similrly for venue edges: w ij = y ij {0,1} ( l l λ l f l (x ij, y ij, y ij = 1, r ij ) ) λ l f l (x ij, yij, yij = 0, rij ) Intuitively, the pper weights wij cn e thought of s the comptiility of ppers x i,, summed over possile dedupliction decisions for venues x i, x j. Similrly, the venue weights wij cn e thought of s the comptiility of venues x i, x j, summed over the possile dedupliction decisions for ppers x i,. Interpreting the weights s the similrity etween two records, we cn see tht the similrity of pper records considers the similrity of their venue records, nd vice vers. This results in weighted, undirected grph with edge weights rnging from to +. It cn e shown tht finding n optiml prtitioning of this grph corresponds to finding the optiml configurtion y in the originl undirected grphicl model. Here, the numer of prtitions is unknown, s it corresponds to the numer of unique records. 8

9 Although grph prtitioning with positive nd negtive edge weights is NPhrd, there exist severl good pproximtions, including recent work in correltion clustering [2]. Additionlly, McCllum nd Wellner [13] hve found tht greedy gglomertive clustering with n verge link criterion works well in prctice. However, trditionl prtitioning lgorithms would not ccount for the known dependencies etween clusters tht exist in our dt. Therefore, we develop novel, reltionl gglomertive clustering lgorithm tht exploits these dependencies. Trditionl greedy gglomertive clustering first initilizes ech vertex to its own cluster, then itertively merges the clusters tht re closest, where the distnce etween clusters is often defined s the verge of the edge weights connecting the two clusters. We ugment this lgorithm with two enhncements. First, we must enforce the constrint tht duplicte ppers hve duplicte venues. This is stright-forwrdly enforced y the following rule: Whenever pir of pper clusters re merged, their corresponding venue clusters must lso e merged. The second enhncement redefines the distnce etween clusters to more ccurtely reflect the impct of the first enhncement. Let Ci, C j e two pper clusters tht re cndidtes to e merged, nd let Ci, C j e the venue clusters corresponding to these ppers. The first enhncement requires tht if we merge Ci, C j, we must lso merge C i, C j. However, the current distnce metric etween Ci, C j does not reflect this fct. To remedy this, we redefine the distnce etween two pper clusters (Ci, C j ) to e the verge of (1) the trditionl distnce etween the pper clusters (Ci, C j ) nd (2) the trditionl distnce etween their corresponding venue clusters (Ci, C j ). (Note tht we choose the verge rther thn the sum to del with ppers tht hve no venue informtion.) This metric is likely to etter pproximte the effect merging Ci, C j will hve on the ojective function p Λ (y x, r), since it ccounts for the merger of the corresponding venue clusters. This new clustering lgorithm provides enefits to oth pper nd venue dedupliction tht would e unville in n independent clustering lgorithm. As illustrted in our motivting exmple in Section 3, it is often the cse tht pper duplictes re not detected ecuse they hve venues with decidedly different surfce forms (e.g. CIKM nd Conference on Knowledge nd Informtion Mngement ). The second enhncement ddresses this prolem y using the evidence from previous venue clusterings to inform pper dedupliction. Specificlly, if CIKM hs lredy een resolved with Conference on Knowledge nd Informtion Mngement, then merging ppers with venues CIKM nd Conference on Knowledge nd Informtion Mngement will e encourged, since there will e high similrity etween their ssocited venue clusters. Conversely, y the hrd constrint introduced in the first enhncement, difficult venue dedupliction decisions re informed y confident pper dedupliction decisions, s ws lso illustrted in Section 3. In this wy, dedupliction decisions for oth record types simultneously grow more ccurte. 9

10 4.4 Prmeter Estimtion Given leled corpus of fully clustered dt, mximum likelihood prmeter estimtion corresponds to finding the prmeters Λ which mximize the loglikelihood of the leled trining dt. Exct estimtion is intrctle here ecuse it requires clculting the normliztion term Z x, sum over ll possile vlues of y, which is sum over ll possile prtitionings of the dt. Due to the nture of the dt nd the high connectivity of the grph, this cnnot e efficiently computed with dynmic progrm. One could perform stochstic grdient scent on n pproximtion of the likelihood. However, it hs een noted tht mximizing product of locl mrginls performs t lest s well s this pproximtion on similr coreference tsk, if not etter [24, 12]. Wheres in [24] the locl mrginls re over single coreference decisions, here we mximize product of joint conditionl proilities for decisions yij nd y ij, which we define s P Λ (yij, yij x, x, r) = 1 Z x exp λ l f l (x ij, yij, yij, r ij ) i,j,l where Z x is normliztion constnt summing over possile vlues for the pir yij, y ij. For leled dtset D, the log-likelihood is defined s L Λ (D) = log y ij,y ij D P Λ (y ij, y ij x, x, r) We perform grdient scent on L y mximizing its derivtive: L λ l = x,y,r D ( i,j,l y ij,y i,j,l λ l f l (x ij, y ij, y ij, r ij ) ij P Λ (y ij, y ij x, x, r) λ l f l (x ij, y ij, y ij, r ij ) Becuse the defined likelihood is convex function, we cn perform grdient scent using ny suitle optimiztion lgorithm. In prticulr, we use limited-memory BFGS, which itertively pproximtes second-order curvture informtion to speed up convergence [19]. The estimtion method cn lso e viewed s lerning distnce metric etween pper-venue pirs. As explined in Section 4.3, this metric is used to weight the edges in the dedupliction grph, which is then prtitioned t inference time. ) 10

11 5 Experiments We evlute our model on two dtsets of reserch pper cittions. The first is from Citeseer [10], contining pproximtely 1500 cittions, with 900 unique ppers nd 350 unique venues. The second is from the Cor Computer Science Reserch Pper Engine 1, contining out 1800 cittions, with 600 unique ppers nd 200 unique venues. Both dtsets re mnully leled for oth pper coreference nd venue coreference, s well s mnully segmented into fields, such s uthor, title, etc. The dt were collected y serching for certin uthors nd topic, nd they re split into susets with non-overlpping ppers for the ske of cross-vlidtion experiments. We used numer of feture functions, including exct nd pproximte string mtch 2 on normlized nd unnormlized vlues for the following cittion fields: title, ooktitle, journl, uthors, venue, dte, editors, institution, nd the entire unsegmented cittion string. We lso clculted n unweighted cosine similrity etween tokens in the title nd uthor fields. Additionl fetures include whether or not the ppers hve the sme puliction type (e.g. journl or conference), s well s the numericl distnce etween fields such s yer nd volume. All rel vlues were inned nd converted into inry-vlued fetures. To evlute performnce, we compre the clusters output y our system with the true clustering using pirwise metrics. Pirwise precision is the frction of pirs in the sme cluster tht re coreferent; pirwise recll is the frction of coreferent ppers tht were plced in the sme cluster. Pirwise F1 is the hrmonic men of pirwise precision nd pirwise recll. Tles 2 nd 3 show the F1 performnce of two systems: joint is the system we hve dvocted in this pper, nd indep is the system which deduplictes records of different types independently. Note tht this system corresponds to model () in Figure 1, so dedupliction decisions re mde collectively for records of the sme type, s in McCllum nd Wellner [13]. Since the McCllum nd Wellner model hs een shown to consistently outperform the clssicl trnsitive closure model, we do not compre with the clssicl model here. Results re listed y the nme of ech test set; the remining sections re used for trining. Venue performnce improves considerly in the joint model, which is plusile considering the strong influence pper dedupliction hs on venue dedupliction. Becuse pper dedupliction often hs more evidence t its disposl thn does venue dedupliction, the joint model drmticlly enhnces venue recll, otining 5% solute recll oost in Citeseer, nd 9% oost in Cor dt. This is especilly noticele when pper dedupliction performnce is high: The hrd constrint requiring the venues of duplicte ppers to e merged often merges venues tht otherwise would hve seemed too dissimilr to merge on their own. Indeed, error nlysis confirms tht mny of the venue 1 mccllum/dt/cor-refs.tr.gz 2 We used the Secondstring pckge, found t 11

12 Pper Venue indep joint indep joint constrint reinforce fce reson Micro Avg Tle 2: Pirwise F1 dedupliction performnce on Citeseer dt. Pper Venue indep joint indep joint kil fhl utgo Micro Avg Tle 3: Pirwise F1 dedupliction performnce on Cor dt. dedupliction errors our model voids re those where venues re dissimilr in form, ut re relted to ppers tht re similr in form. More interestingly, noticele improvement in pper dedupliction is ttined y the collective model. Prt of this is due to the precision enhncement provided y the constrined clustering lgorithm. Workshop nd technicl report versions of journl or conference ppers with the sme title re correctly not merged when the venues re ccurtely identified. Also, error nlysis suggests tht ppers tht would not hve een otherwise merged were merged ecuse their venues were determined to e coreferent. It is worth noting tht mny of the errors mde y the joint model hve cuses similr to those tht on verge hve improved performnce. For exmple, if pper dedupliction ccurcy is poor, the reltionl clustering lgorithm cn result in mny venues eing merged tht would not hve een otherwise. Future work should investigte how to detect poor pper ccurcy nd djust ccordingly. 5.1 Sclility While the dtsets used in our experiments re of resonle size, we would ultimtely like to pply this model to lrge dtses. Here we riefly discuss performnce issues nd descrie methods to scle our model to rel-world dt. Becuse prmeter estimtion mximizes the product of locl node potentils, it is likely to e much fster thn glol pproximte trining method such s loopy elief propgtion. For the dt used in these experiments, trining time verged out 15 minutes on dul-processor, 3.06 GHz Xeon mchines 12

13 with 4 GB of RAM. Inference time rnged from 20 minutes to out n hour, depending on the size of the testing dt. To mke inference sclle, prcticl implementtion would mke use of cnopies [14]. This technique reduces the connectivity of the grph of records y defining chep similrity metric etween records (often using n inverted index or tf-idf). When constructing the record grph, edges re only dded etween those records tht hve similrity ove some threshold, where similrity is the output of the chep metric. In this wy, records tht re very unlikely to e duplictes re not considered y the model. This provides n efficient, ccurte wy of pruning the serch spce. Besides performnce issues, there is reson to elieve tht the dvntges of the model presented in this pper will e even more noticele in lrger dt sets, where there is more heterogeneity in field vlues, more interesting reltionl ptterns, nd lrger record clusters. In fct, the dt used in experiments presented here contin mny singleton clusters, which is not truly reflective of the lrge clusters of records found in rel-world dt. 6 Conclusions We hve introduced collective model for dedupliction of relted records of multiple types nd demonstrted empiriclly the dvntges it hs over methods tht do not ddress the interdependencies inherent in reltionl dt. Bsed on these results, two promising res of future reserch re (1) extending the model to dtses with more thn two types of records, nd (2) modeling the reltion vriles R. In ddition to pper nd venue dedupliction, uthor dedupliction is lso difficult prolem tht would likely enefit from this pproch, nd we re in the process of hrvesting dt to llow us to model uthor, venue, nd pper dedupliction jointly. In the pulictions domin, the connections etween uthor, venue, nd pper dedupliction ecome more interesting with lrger dtses, where communities nd reltions ecome more visile. In prticulr, exciting chllenges include uilding model to predict dvisor of reltions etween uthors, suggest possile venues for pper, identify fruitful uthor collortions, mtch recent grdutes with potentil reserch ls, nd discover the dynmics of reserch communities. The chllenge s usul will lie in developing model tht is complex enough to model these long-distnce reltions, ut is still trctle enough to perform on rel dt. We feel tht the model proposed here is productive step in tht long-term direction. 7 Acknowledgments This work ws supported in prt y the Center for Intelligent Informtion Retrievl, in prt y U.S. Government contrct #NBCH through sucon- 13

14 trct with BBNT Solutions LLC, in prt y The Centrl Intelligence Agency, the Ntionl Security Agency nd Ntionl Science Foundtion under NSF grnt #IIS , nd in prt y the Defense Advnced Reserch Projects Agency (DARPA), through the Deprtment of the Interior, NBC, Acquisition Services Division, under contrct numer NBCHD Any opinions, findings nd conclusions or recommendtions expressed in this mteril re the uthor(s) nd do not necessrily reflect those of the sponsor. References [1] R. Annthkrishn, S. Chudhuri, nd V. Gnti. Eliminting fuzzy duplictes in dt wrehouses. In In Proceedings of the 28th Interntionl Conference on Very Lrge Dtses (VLDB 2002), [2] Nikhil Bnsl, Avrim Blum, nd Shuchi Chwl. Correltion clustering. Mchine Lerining, 56:89 113, [3] Indrjit Bhttchry nd Lise Getoor. Dedupliction nd groupd detection using links. In 10th ACM SIGKDD Workshop on Link Anlysis nd Group Detection (LinkKDD-04), [4] Indrjit Bhttchry nd Lise Getoor. Itertive record linkge for clening nd integrtion. In Proceedings of the SIGMOD 2004 Workshop on Reserch Issues on Dt Mining nd Knowledge Discovery, June [5] Mikhil Bilenko nd Rymond J. Mooney. Adptive duplicte detection using lernle string similrity mesures. In 9th ACM SIGKDD Interntionl Conference on Knowledge Discovery nd Dt Mining (KDD-2003), pges 39 48, [6] Yuri Boykov, Olg Veksler, nd Rmin Zih. Fst pproximte energy minimiztion vi grph cuts. In IEEE trnsctions on Pttern Anlysis nd Mchine Intelligence (PAMI), 23(11): , [7] W. Cohen nd J. Richmn. Lerning to mtch nd cluster entity nmes. In ACM SIGIR 01 workshop on Mthemticl / Forml Methods in IR, [8] I. P. Fellegi nd A. B. Sunter. A theory for record linkge. Journl of the Americn Sttisticl Assocition, 64: , [9] John Lfferty, Andrew McCllum, nd Fernndo Pereir. Conditionl rndom fields: Proilistic models for segmenting nd leling sequence dt. In Proc. 18th Interntionl Conf. on Mchine Lerning, pges Morgn Kufmnn, Sn Frncisco, CA, [10] S. Lwrence, C. L. Giles, nd K. Bollker. Digitl lirries nd utonomous cittion indexing. IEEE Computer, 32:67 71,

15 [11] Bhskr Mrthi, Brin Milch, nd Sturt Russell. First-order proilistic models for informtion extrction. In Proceedings of the IJCAI-2003 Workshop on Lerning Sttisticl Models from Reltionl Dt, pges 71 78, Acpulco, Mexico, August [12] Andrew McCllum nd Chrles Sutton. Piecewise trining with prmeter independence digrms: Compring glolly- nd loclly-trined linerchin crfs. In NIPS 2004 Workshop on Lerning with Structured Outputs, [13] Andrew McCllum nd Ben Wellner. Conditionl models of identity uncertinty with ppliction to noun coreference. In Lwrence K. Sul, Yir Weiss, nd Léon Bottou, editors, Advnces in Neurl Informtion Processing Systems 17. MIT Press, Cmridge, MA, [14] Andrew K. McCllum, Kml Nigm, nd Lyle Ungr. Efficient clustering of high-dimensionl dt sets with ppliction to reference mtching. In Proceedings of the Sixth Interntionl Conference On Knowledge Discovery nd Dt Mining (KDD-2000), Boston, MA, [15] Brin Milch, Bhskr Mrthi, nd Sturt Russell. Blog: Reltionl modeling with unknown ojects. In ICML 2004 Workshop on Sttisticl Reltionl Lerning nd Its Connections to Other Fields, [16] Thoms Morton. Coreference for NLP pplictions. In ACL, [17] H. B. Newcome, J. M. Kennedy, S. J. Axford, nd A.P. Jmes. Automtic linkge of vitl records. Science, 130:954 9, [18] Vincent Ng nd Clire Crdie. Improving mchine lerning pproches to coreference resolution. In Proceedings of the 40th Annul Meeting of the Assocition for Computtionl Linguistics, [19] J. Nocedl. Updting qusi-newton mtrices with limited storge. Mthemtics of Computtion, 95: , [20] Prg nd Pedro Domingos. Multi-reltionl record linkge. In Proceedings of the KDD-2004 Workshop on Multi-Reltionl Dt Mining, pges 31 48, August [21] Jude Perl. Proilistic Resoning in Intelligent Systems: Networks of Plusile Inference. Morgn Kufmn, [22] Prdeep Rvikumr nd Willim W. Cohen. A hierrchicl grphicl model for record linkge. In UAI 2004, [23] Ben Tskr, Aeel Pieter, nd Dphne Koller. Discrimintive proilistic models for reltionl dt. In Uncertinty in Artificil Intelligence: Proceedings of the Eighteenth Conference (UAI-2002), pges , Sn Frncisco, CA, Morgn Kufmnn Pulishers. 15

16 [24] Ben Wellner, Andrew McCllum, Fuchun Peng, nd Michel Hy. An integrted, conditionl model of informtion extrction nd coreference with ppliction to cittion mtching. In Conference on Uncertinty in Artificil Intelligence (UAI), [25] Willim E. Winkler. Improved decision rules in the fellegi-sunter model of record linkge. Technicl report, Sttisticl Reserch Division, U.S. Census Bureu, Wshington, DC, [26] Willim E. Winkler. Methods for record linkge nd Byesin networks. Technicl report, Sttisticl Reserch Division, U.S. Census Bureu, Wshington, DC,

EQUATIONS OF LINES AND PLANES

EQUATIONS OF LINES AND PLANES EQUATIONS OF LINES AND PLANES MATH 195, SECTION 59 (VIPUL NAIK) Corresponding mteril in the ook: Section 12.5. Wht students should definitely get: Prmetric eqution of line given in point-direction nd twopoint

More information

Reasoning to Solve Equations and Inequalities

Reasoning to Solve Equations and Inequalities Lesson4 Resoning to Solve Equtions nd Inequlities In erlier work in this unit, you modeled situtions with severl vriles nd equtions. For exmple, suppose you were given usiness plns for concert showing

More information

Graphs on Logarithmic and Semilogarithmic Paper

Graphs on Logarithmic and Semilogarithmic Paper 0CH_PHClter_TMSETE_ 3//00 :3 PM Pge Grphs on Logrithmic nd Semilogrithmic Pper OBJECTIVES When ou hve completed this chpter, ou should be ble to: Mke grphs on logrithmic nd semilogrithmic pper. Grph empiricl

More information

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Or more simply put, when adding or subtracting quantities, their uncertainties add. Propgtion of Uncertint through Mthemticl Opertions Since the untit of interest in n eperiment is rrel otined mesuring tht untit directl, we must understnd how error propgtes when mthemticl opertions re

More information

Small Businesses Decisions to Offer Health Insurance to Employees

Small Businesses Decisions to Offer Health Insurance to Employees Smll Businesses Decisions to Offer Helth Insurnce to Employees Ctherine McLughlin nd Adm Swinurn, June 2014 Employer-sponsored helth insurnce (ESI) is the dominnt source of coverge for nonelderly dults

More information

Concept Formation Using Graph Grammars

Concept Formation Using Graph Grammars Concept Formtion Using Grph Grmmrs Istvn Jonyer, Lwrence B. Holder nd Dine J. Cook Deprtment of Computer Science nd Engineering University of Texs t Arlington Box 19015 (416 Ytes St.), Arlington, TX 76019-0015

More information

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Byesin Updting with Continuous Priors Clss 3, 8.05, Spring 04 Jeremy Orloff nd Jonthn Bloom Lerning Gols. Understnd prmeterized fmily of distriutions s representing continuous rnge of hypotheses for the

More information

Regular Sets and Expressions

Regular Sets and Expressions Regulr Sets nd Expressions Finite utomt re importnt in science, mthemtics, nd engineering. Engineers like them ecuse they re super models for circuits (And, since the dvent of VLSI systems sometimes finite

More information

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn 33337_0P03.qp 2/27/06 24 9:3 AM Chpter P Pge 24 Prerequisites P.3 Polynomils nd Fctoring Wht you should lern Polynomils An lgeric epression is collection of vriles nd rel numers. The most common type of

More information

Scalable Mining of Large Disk-based Graph Databases

Scalable Mining of Large Disk-based Graph Databases Sclle Mining of Lrge Disk-sed Grph Dtses Chen Wng Wei Wng Jin Pei Yongti Zhu Bile Shi Fudn University, Chin, {chenwng, weiwng1, 2465, shi}@fudn.edu.cn Stte University of New York t Bufflo, USA & Simon

More information

CS99S Laboratory 2 Preparation Copyright W. J. Dally 2001 October 1, 2001

CS99S Laboratory 2 Preparation Copyright W. J. Dally 2001 October 1, 2001 CS99S Lortory 2 Preprtion Copyright W. J. Dlly 2 Octoer, 2 Ojectives:. Understnd the principle of sttic CMOS gte circuits 2. Build simple logic gtes from MOS trnsistors 3. Evlute these gtes to oserve logic

More information

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( ) Polynomil Functions Polynomil functions in one vrible cn be written in expnded form s n n 1 n 2 2 f x = x + x + x + + x + x+ n n 1 n 2 2 1 0 Exmples of polynomils in expnded form re nd 3 8 7 4 = 5 4 +

More information

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions. Lerning Objectives Loci nd Conics Lesson 3: The Ellipse Level: Preclculus Time required: 120 minutes In this lesson, students will generlize their knowledge of the circle to the ellipse. The prmetric nd

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

** Dpt. Chemical Engineering, Kasetsart University, Bangkok 10900, Thailand

** Dpt. Chemical Engineering, Kasetsart University, Bangkok 10900, Thailand Modelling nd Simultion of hemicl Processes in Multi Pulse TP Experiment P. Phnwdee* S.O. Shekhtmn +. Jrungmnorom** J.T. Gleves ++ * Dpt. hemicl Engineering, Ksetsrt University, Bngkok 10900, Thilnd + Dpt.hemicl

More information

Linear Programming in Database

Linear Programming in Database 9 Liner Progrmming in Dtse Akir Kwguchi nd Andrew Ngel Deprtment of Computer Science, The City College of New York. New York, New York United Sttes of Americ Keywords: liner progrmming, simple method,

More information

All pay auctions with certain and uncertain prizes a comment

All pay auctions with certain and uncertain prizes a comment CENTER FOR RESEARC IN ECONOMICS AND MANAGEMENT CREAM Publiction No. 1-2015 All py uctions with certin nd uncertin prizes comment Christin Riis All py uctions with certin nd uncertin prizes comment Christin

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Hillsborough Township Public Schools Mathematics Department Computer Programming 1

Hillsborough Township Public Schools Mathematics Department Computer Programming 1 Essentil Unit 1 Introduction to Progrmming Pcing: 15 dys Common Unit Test Wht re the ethicl implictions for ming in tody s world? There re ethicl responsibilities to consider when writing computer s. Citizenship,

More information

How To Network A Smll Business

How To Network A Smll Business Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Modular Generic Verification of LTL Properties for Aspects

Modular Generic Verification of LTL Properties for Aspects Modulr Generic Verifiction of LTL Properties for Aspects Mx Goldmn Shmuel Ktz Computer Science Deprtment Technion Isrel Institute of Technology {mgoldmn, ktz}@cs.technion.c.il ABSTRACT Aspects re seprte

More information

Factoring Polynomials

Factoring Polynomials Fctoring Polynomils Some definitions (not necessrily ll for secondry school mthemtics): A polynomil is the sum of one or more terms, in which ech term consists of product of constnt nd one or more vribles

More information

Section 5-4 Trigonometric Functions

Section 5-4 Trigonometric Functions 5- Trigonometric Functions Section 5- Trigonometric Functions Definition of the Trigonometric Functions Clcultor Evlution of Trigonometric Functions Definition of the Trigonometric Functions Alternte Form

More information

5 a LAN 6 a gateway 7 a modem

5 a LAN 6 a gateway 7 a modem STARTER With the help of this digrm, try to descrie the function of these components of typicl network system: 1 file server 2 ridge 3 router 4 ckone 5 LAN 6 gtewy 7 modem Another Novell LAN Router Internet

More information

The Velocity Factor of an Insulated Two-Wire Transmission Line

The Velocity Factor of an Insulated Two-Wire Transmission Line The Velocity Fctor of n Insulted Two-Wire Trnsmission Line Problem Kirk T. McDonld Joseph Henry Lbortories, Princeton University, Princeton, NJ 08544 Mrch 7, 008 Estimte the velocity fctor F = v/c nd the

More information

Automated Grading of DFA Constructions

Automated Grading of DFA Constructions Automted Grding of DFA Constructions Rjeev Alur nd Loris D Antoni Sumit Gulwni Dileep Kini nd Mhesh Viswnthn Deprtment of Computer Science Microsoft Reserch Deprtment of Computer Science University of

More information

RTL Power Optimization with Gate-level Accuracy

RTL Power Optimization with Gate-level Accuracy RTL Power Optimiztion with Gte-level Accurcy Qi Wng Cdence Design Systems, Inc Sumit Roy Clypto Design Systems, Inc 555 River Oks Prkwy, Sn Jose 95125 2903 Bunker Hill Lne, Suite 208, SntClr 95054 qwng@cdence.com

More information

How To Make A Network More Efficient

How To Make A Network More Efficient Rethinking Virtul Network Emedding: Sustrte Support for Pth Splitting nd Migrtion Minln Yu, Yung Yi, Jennifer Rexford, Mung Ching Princeton University Princeton, NJ {minlnyu,yyi,jrex,chingm}@princeton.edu

More information

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management Journl of Mchine Lerning Reserch 9 (2008) 2079-2 Submitted 8/08; Published 0/08 Vlue Function Approximtion using Multiple Aggregtion for Multittribute Resource Mngement Abrhm George Wrren B. Powell Deprtment

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered: Appendi D: Completing the Squre nd the Qudrtic Formul Fctoring qudrtic epressions such s: + 6 + 8 ws one of the topics introduced in Appendi C. Fctoring qudrtic epressions is useful skill tht cn help you

More information

Enterprise Risk Management Software Buyer s Guide

Enterprise Risk Management Software Buyer s Guide Enterprise Risk Mngement Softwre Buyer s Guide 1. Wht is Enterprise Risk Mngement? 2. Gols of n ERM Progrm 3. Why Implement ERM 4. Steps to Implementing Successful ERM Progrm 5. Key Performnce Indictors

More information

Econ 4721 Money and Banking Problem Set 2 Answer Key

Econ 4721 Money and Banking Problem Set 2 Answer Key Econ 472 Money nd Bnking Problem Set 2 Answer Key Problem (35 points) Consider n overlpping genertions model in which consumers live for two periods. The number of people born in ech genertion grows in

More information

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process An Undergrdute Curriculum Evlution with the Anlytic Hierrchy Process Les Frir Jessic O. Mtson Jck E. Mtson Deprtment of Industril Engineering P.O. Box 870288 University of Albm Tuscloos, AL. 35487 Abstrct

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Software Cost Estimation Model Based on Integration of Multi-agent and Case-Based Reasoning

Software Cost Estimation Model Based on Integration of Multi-agent and Case-Based Reasoning Journl of Computer Science 2 (3): 276-282, 2006 ISSN 1549-3636 2006 Science Publictions Softwre Cost Estimtion Model Bsed on Integrtion of Multi-gent nd Cse-Bsed Resoning Hsn Al-Skrn Informtion Technology

More information

AntiSpyware Enterprise Module 8.5

AntiSpyware Enterprise Module 8.5 AntiSpywre Enterprise Module 8.5 Product Guide Aout the AntiSpywre Enterprise Module The McAfee AntiSpywre Enterprise Module 8.5 is n dd-on to the VirusScn Enterprise 8.5i product tht extends its ility

More information

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany Lecture Notes to Accompny Scientific Computing An Introductory Survey Second Edition by Michel T Heth Boundry Vlue Problems Side conditions prescribing solution or derivtive vlues t specified points required

More information

Experiment 6: Friction

Experiment 6: Friction Experiment 6: Friction In previous lbs we studied Newton s lws in n idel setting, tht is, one where friction nd ir resistnce were ignored. However, from our everydy experience with motion, we know tht

More information

Introducing Kashef for Application Monitoring

Introducing Kashef for Application Monitoring WextWise 2010 Introducing Kshef for Appliction The Cse for Rel-time monitoring of dtcenter helth is criticl IT process serving vriety of needs. Avilbility requirements of 6 nd 7 nines of tody SOA oriented

More information

A.7.1 Trigonometric interpretation of dot product... 324. A.7.2 Geometric interpretation of dot product... 324

A.7.1 Trigonometric interpretation of dot product... 324. A.7.2 Geometric interpretation of dot product... 324 A P P E N D I X A Vectors CONTENTS A.1 Scling vector................................................ 321 A.2 Unit or Direction vectors...................................... 321 A.3 Vector ddition.................................................

More information

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE Skndz, Stockholm ABSTRACT Three methods for fitting multiplictive models to observed, cross-clssified

More information

How To Set Up A Network For Your Business

How To Set Up A Network For Your Business Why Network is n Essentil Productivity Tool for Any Smll Business TechAdvisory.org SME Reports sponsored by Effective technology is essentil for smll businesses looking to increse their productivity. Computer

More information

Revisions published in the University of Innsbruck Bulletin of 18 June 2014, Issue 31, No. 509

Revisions published in the University of Innsbruck Bulletin of 18 June 2014, Issue 31, No. 509 Plese note: The following curriculum is for informtion purposes only nd not leglly inding. The leglly inding version is pulished in the pertinent University of Innsruck Bulletins. Originl version pulished

More information

Economics Letters 65 (1999) 9 15. macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Economics Letters 65 (1999) 9 15. macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999 Economics Letters 65 (1999) 9 15 Estimting dynmic pnel dt models: guide for q mcroeconomists b, * Ruth A. Judson, Ann L. Owen Federl Reserve Bord of Governors, 0th & C Sts., N.W. Wshington, D.C. 0551,

More information

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment ClerPeks Customer Cre Guide Business s Usul (BU) Services Pece of mind for your BI Investment ClerPeks Customer Cre Business s Usul Services Tble of Contents 1. Overview...3 Benefits of Choosing ClerPeks

More information

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity Bbylonin Method of Computing the Squre Root: Justifictions Bsed on Fuzzy Techniques nd on Computtionl Complexity Olg Koshelev Deprtment of Mthemtics Eduction University of Texs t El Pso 500 W. University

More information

2. Transaction Cost Economics

2. Transaction Cost Economics 3 2. Trnsction Cost Economics Trnsctions Trnsctions Cn Cn Be Be Internl Internl or or Externl Externl n n Orgniztion Orgniztion Trnsctions Trnsctions occur occur whenever whenever good good or or service

More information

Small Business Cloud Services

Small Business Cloud Services Smll Business Cloud Services Summry. We re thick in the midst of historic se-chnge in computing. Like the emergence of personl computers, grphicl user interfces, nd mobile devices, the cloud is lredy profoundly

More information

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES DAVID WEBB CONTENTS Liner trnsformtions 2 The representing mtrix of liner trnsformtion 3 3 An ppliction: reflections in the plne 6 4 The lgebr of

More information

Integration by Substitution

Integration by Substitution Integrtion by Substitution Dr. Philippe B. Lvl Kennesw Stte University August, 8 Abstrct This hndout contins mteril on very importnt integrtion method clled integrtion by substitution. Substitution is

More information

Gaze Manipulation for One-to-one Teleconferencing

Gaze Manipulation for One-to-one Teleconferencing Gze Mnipultion for One-to-one Teleconferencing A. Criminisi, J. Shotton, A. Blke, P.H.S. Torr Microsoft Reserch Ltd, Cmridge, UK Astrct A new lgorithm is proposed for novel view genertion in one-toone

More information

Helicopter Theme and Variations

Helicopter Theme and Variations Helicopter Theme nd Vritions Or, Some Experimentl Designs Employing Pper Helicopters Some possible explntory vribles re: Who drops the helicopter The length of the rotor bldes The height from which the

More information

CHAPTER 11 Numerical Differentiation and Integration

CHAPTER 11 Numerical Differentiation and Integration CHAPTER 11 Numericl Differentition nd Integrtion Differentition nd integrtion re bsic mthemticl opertions with wide rnge of pplictions in mny res of science. It is therefore importnt to hve good methods

More information

FAULT TREES AND RELIABILITY BLOCK DIAGRAMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University

FAULT TREES AND RELIABILITY BLOCK DIAGRAMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University SYSTEM FAULT AND Hrry G. Kwtny Deprtment of Mechnicl Engineering & Mechnics Drexel University OUTLINE SYSTEM RBD Definition RBDs nd Fult Trees System Structure Structure Functions Pths nd Cutsets Reliility

More information

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers. 2 Rtionl Numbers Integers such s 5 were importnt when solving the eqution x+5 = 0. In similr wy, frctions re importnt for solving equtions like 2x = 1. Wht bout equtions like 2x + 1 = 0? Equtions of this

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Operations with Polynomials

Operations with Polynomials 38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: Write polynomils in stndrd form nd identify the leding coefficients nd degrees of polynomils Add nd subtrct polynomils Multiply

More information

Gaze Manipulation for One-to-one Teleconferencing

Gaze Manipulation for One-to-one Teleconferencing Gze Mnipultion for One-to-one Teleconferencing A. Criminisi, J. Shotton, A. Blke, P.H.S. Torr Microsoft Reserch Ltd, Cmridge, UK Astrct A new lgorithm is proposed for novel view genertion in one-toone

More information

JaERM Software-as-a-Solution Package

JaERM Software-as-a-Solution Package JERM Softwre-s--Solution Pckge Enterprise Risk Mngement ( ERM ) Public listed compnies nd orgnistions providing finncil services re required by Monetry Authority of Singpore ( MAS ) nd/or Singpore Stock

More information

Protocol Analysis. 17-654/17-764 Analysis of Software Artifacts Kevin Bierhoff

Protocol Analysis. 17-654/17-764 Analysis of Software Artifacts Kevin Bierhoff Protocol Anlysis 17-654/17-764 Anlysis of Softwre Artifcts Kevin Bierhoff Tke-Awys Protocols define temporl ordering of events Cn often be cptured with stte mchines Protocol nlysis needs to py ttention

More information

Pentominoes. Pentominoes. Bruce Baguley Cascade Math Systems, LLC. The pentominoes are a simple-looking set of objects through which some powerful

Pentominoes. Pentominoes. Bruce Baguley Cascade Math Systems, LLC. The pentominoes are a simple-looking set of objects through which some powerful Pentominoes Bruce Bguley Cscde Mth Systems, LLC Astrct. Pentominoes nd their reltives the polyominoes, polycues, nd polyhypercues will e used to explore nd pply vrious importnt mthemticl concepts. In this

More information

Abstract. This paper introduces new algorithms and data structures for quick counting for machine

Abstract. This paper introduces new algorithms and data structures for quick counting for machine Journl of Artiæcil Intelligence Reserch 8 è998è 67-9 Submitted 7è97; published è98 Cched Suæcient Sttistics for Eæcient Mchine Lerning with Lrge Dtsets Andrew Moore Mry Soon Lee School of Computer Science

More information

Utilization of Smoking Cessation Benefits in Medicaid Managed Care, 2009-2013

Utilization of Smoking Cessation Benefits in Medicaid Managed Care, 2009-2013 Utiliztion of Smoking Cesstion Benefits in Medicid Mnged Cre, 2009-2013 Office of Qulity nd Ptient Sfety New York Stte Deprtment of Helth Jnury 2015 Introduction According to the New York Stte Tocco Control

More information

Math 135 Circles and Completing the Square Examples

Math 135 Circles and Completing the Square Examples Mth 135 Circles nd Completing the Squre Exmples A perfect squre is number such tht = b 2 for some rel number b. Some exmples of perfect squres re 4 = 2 2, 16 = 4 2, 169 = 13 2. We wish to hve method for

More information

Integration. 148 Chapter 7 Integration

Integration. 148 Chapter 7 Integration 48 Chpter 7 Integrtion 7 Integrtion t ech, by supposing tht during ech tenth of second the object is going t constnt speed Since the object initilly hs speed, we gin suppose it mintins this speed, but

More information

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

Decision Rule Extraction from Trained Neural Networks Using Rough Sets Decision Rule Extrction from Trined Neurl Networks Using Rough Sets Alin Lzr nd Ishwr K. Sethi Vision nd Neurl Networks Lbortory Deprtment of Computer Science Wyne Stte University Detroit, MI 48 ABSTRACT

More information

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one. 5.2. LINE INTEGRALS 265 5.2 Line Integrls 5.2.1 Introduction Let us quickly review the kind of integrls we hve studied so fr before we introduce new one. 1. Definite integrl. Given continuous rel-vlued

More information

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes The Sclr Product 9.3 Introduction There re two kinds of multipliction involving vectors. The first is known s the sclr product or dot product. This is so-clled becuse when the sclr product of two vectors

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Lecture 3 Gussin Probbility Distribution Introduction l Gussin probbility distribution is perhps the most used distribution in ll of science. u lso clled bell shped curve or norml distribution l Unlike

More information

Vectors 2. 1. Recap of vectors

Vectors 2. 1. Recap of vectors Vectors 2. Recp of vectors Vectors re directed line segments - they cn be represented in component form or by direction nd mgnitude. We cn use trigonometry nd Pythgors theorem to switch between the forms

More information

Rotational Equilibrium: A Question of Balance

Rotational Equilibrium: A Question of Balance Prt of the IEEE Techer In-Service Progrm - Lesson Focus Demonstrte the concept of rottionl equilirium. Lesson Synopsis The Rottionl Equilirium ctivity encourges students to explore the sic concepts of

More information

belief Propgtion Lgorithm in Nd Pent Penta

belief Propgtion Lgorithm in Nd Pent Penta IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2012 375 Itertive Trust nd Reputtion Mngement Using Belief Propgtion Ermn Aydy, Student Member, IEEE, nd Frmrz Feri, Senior

More information

ORBITAL MANEUVERS USING LOW-THRUST

ORBITAL MANEUVERS USING LOW-THRUST Proceedings of the 8th WSEAS Interntionl Conference on SIGNAL PROCESSING, ROBOICS nd AUOMAION ORBIAL MANEUVERS USING LOW-HRUS VIVIAN MARINS GOMES, ANONIO F. B. A. PRADO, HÉLIO KOII KUGA Ntionl Institute

More information

Basic Research in Computer Science BRICS RS-02-13 Brodal et al.: Solving the String Statistics Problem in Time O(n log n)

Basic Research in Computer Science BRICS RS-02-13 Brodal et al.: Solving the String Statistics Problem in Time O(n log n) BRICS Bsic Reserch in Computer Science BRICS RS-02-13 Brodl et l.: Solving the String Sttistics Prolem in Time O(n log n) Solving the String Sttistics Prolem in Time O(n log n) Gerth Stølting Brodl Rune

More information

2 DIODE CLIPPING and CLAMPING CIRCUITS

2 DIODE CLIPPING and CLAMPING CIRCUITS 2 DIODE CLIPPING nd CLAMPING CIRCUITS 2.1 Ojectives Understnding the operting principle of diode clipping circuit Understnding the operting principle of clmping circuit Understnding the wveform chnge of

More information

Morgan Stanley Ad Hoc Reporting Guide

Morgan Stanley Ad Hoc Reporting Guide spphire user guide Ferury 2015 Morgn Stnley Ad Hoc Reporting Guide An Overview For Spphire Users 1 Introduction The Ad Hoc Reporting tool is ville for your reporting needs outside of the Spphire stndrd

More information

A Fuzzy Clustering Approach to Filter Spam E-Mail

A Fuzzy Clustering Approach to Filter Spam E-Mail , July 6-8, 2011, London, U.K. A Fuzzy Clustering Approch to Filter Spm E-Mil N.T.Mohmmd Abstrct Spm emil, is the prctice of frequently sending unwnted emil messges, usully with commercil content, in lrge

More information

A Study on Autonomous Cooperation between Things in Web of Things

A Study on Autonomous Cooperation between Things in Web of Things A Study on Autonomous Coopertion etween Things in We of Things Jehk Yu, Hyunjoong Kng, Hyo-Chn Bng, MyungNm Be 2 Electronics nd Telecommunictions Reserch Institute, 38 Gjeongno, Yuseong-gu, Dejeon, 305-700,

More information

Example- Based Learning in Computer- Aided STEM Education

Example- Based Learning in Computer- Aided STEM Education contriuted rticles Exmple- Bsed Lerning in Computer- Aided STEM Eduction DOI:10.1145/2634273 Exmple-sed resoning techniques developed for progrmming lnguges lso help utomte repetitive tsks in eduction.

More information

GFI MilArchiver 6 vs C2C Archive One Policy Mnger GFI Softwre www.gfi.com GFI MilArchiver 6 vs C2C Archive One Policy Mnger GFI MilArchiver 6 C2C Archive One Policy Mnger Who we re Generl fetures Supports

More information

The Acoustic Design of Soundproofing Doors and Windows

The Acoustic Design of Soundproofing Doors and Windows 3 The Open Acoustics Journl, 1, 3, 3-37 The Acoustic Design of Soundproofing Doors nd Windows Open Access Nishimur Yuy,1, Nguyen Huy Qung, Nishimur Sohei 1, Nishimur Tsuyoshi 3 nd Yno Tkshi 1 Kummoto Ntionl

More information

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY MAT 0630 INTERNET RESOURCES, REVIEW OF CONCEPTS AND COMMON MISTAKES PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY Contents 1. ACT Compss Prctice Tests 1 2. Common Mistkes 2 3. Distributive

More information

Portfolio approach to information technology security resource allocation decisions

Portfolio approach to information technology security resource allocation decisions Portfolio pproch to informtion technology security resource lloction decisions Shivrj Knungo Deprtment of Decision Sciences The George Wshington University Wshington DC 20052 knungo@gwu.edu Abstrct This

More information

Solving the String Statistics Problem in Time O(n log n)

Solving the String Statistics Problem in Time O(n log n) Solving the String Sttistics Prolem in Time O(n log n) Gerth Stølting Brodl 1,,, Rune B. Lyngsø 3, Ann Östlin1,, nd Christin N. S. Pedersen 1,2, 1 BRICS, Deprtment of Computer Science, University of Arhus,

More information

Simulation of operation modes of isochronous cyclotron by a new interative method

Simulation of operation modes of isochronous cyclotron by a new interative method NUKLEONIKA 27;52(1):29 34 ORIGINAL PAPER Simultion of opertion modes of isochronous cyclotron y new intertive method Ryszrd Trszkiewicz, Mrek Tlch, Jcek Sulikowski, Henryk Doruch, Tdeusz Norys, Artur Srok,

More information

A Network Management System for Power-Line Communications and its Verification by Simulation

A Network Management System for Power-Line Communications and its Verification by Simulation A Network Mngement System for Power-Line Communictions nd its Verifiction y Simultion Mrkus Seeck, Gerd Bumiller GmH Unterschluerscher-Huptstr. 10, D-90613 Großhersdorf, Germny Phone: +49 9105 9960-51,

More information

Gene Expression Programming: A New Adaptive Algorithm for Solving Problems

Gene Expression Programming: A New Adaptive Algorithm for Solving Problems Gene Expression Progrmming: A New Adptive Algorithm for Solving Prolems Cândid Ferreir Deprtmento de Ciêncis Agráris Universidde dos Açores 9701-851 Terr-Chã Angr do Heroísmo, Portugl Complex Systems,

More information

9 CONTINUOUS DISTRIBUTIONS

9 CONTINUOUS DISTRIBUTIONS 9 CONTINUOUS DISTIBUTIONS A rndom vrible whose vlue my fll nywhere in rnge of vlues is continuous rndom vrible nd will be ssocited with some continuous distribution. Continuous distributions re to discrete

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journl of Systems nd Softwre xxx (2008) xxx xxx Contents lists ville t ScienceDirect The Journl of Systems nd Softwre journl homepge: www.elsevier.com/locte/jss Energy-efficient rel-time oject trcking

More information

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100 hsn.uk.net Higher Mthemtics UNIT 3 OUTCOME 1 Vectors Contents Vectors 18 1 Vectors nd Sclrs 18 Components 18 3 Mgnitude 130 4 Equl Vectors 131 5 Addition nd Subtrction of Vectors 13 6 Multipliction by

More information

Basic Analysis of Autarky and Free Trade Models

Basic Analysis of Autarky and Free Trade Models Bsic Anlysis of Autrky nd Free Trde Models AUTARKY Autrky condition in prticulr commodity mrket refers to sitution in which country does not engge in ny trde in tht commodity with other countries. Consequently

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 16 th May 2008. Time: 14:00 16:00

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 16 th May 2008. Time: 14:00 16:00 COMP20212 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Digitl Design Techniques Dte: Fridy 16 th My 2008 Time: 14:00 16:00 Plese nswer ny THREE Questions from the FOUR questions provided

More information

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report DlNBVRGH + + THE CITY OF EDINBURGH COUNCIL Sickness Absence Monitoring Report Executive of the Council 8fh My 4 I.I...3 Purpose of report This report quntifies the mount of working time lost s result of

More information

A Decision Theoretic Framework for Ranking using Implicit Feedback

A Decision Theoretic Framework for Ranking using Implicit Feedback A Decision Theoretic Frmework for Rnking using Implicit Feedbck Onno Zoeter Michel Tylor Ed Snelson John Guiver Nick Crswell Mrtin Szummer Microsoft Reserch Cmbridge 7 J J Thomson Avenue Cmbridge, United

More information

Encoded Bitmp Indexing for Dt Wrehouses Ming-Chun Wu DVS, Computer Science Deprtment Technische Universitt Drmstdt, GERMANY wu@dvsinformtiktu-drmstdtde Alejndro P Buchmnn DVS, Computer Science Deprtment

More information

On the Robustness of Most Probable Explanations

On the Robustness of Most Probable Explanations On the Robustness of Most Probble Explntions Hei Chn School of Electricl Engineering nd Computer Science Oregon Stte University Corvllis, OR 97330 chnhe@eecs.oregonstte.edu Adnn Drwiche Computer Science

More information

Learning to Search Better than Your Teacher

Learning to Search Better than Your Teacher Ki-Wei Chng University of Illinois t Urbn Chmpign, IL Akshy Krishnmurthy Crnegie Mellon University, Pittsburgh, PA Alekh Agrwl Microsoft Reserch, New York, NY Hl Dumé III University of Mrylnd, College

More information

Vector differentiation. Chapters 6, 7

Vector differentiation. Chapters 6, 7 Chpter 2 Vectors Courtesy NASA/JPL-Cltech Summry (see exmples in Hw 1, 2, 3) Circ 1900 A.D., J. Willird Gis invented useful comintion of mgnitude nd direction clled vectors nd their higher-dimensionl counterprts

More information