Title Language Model for Information Retrieval

Size: px
Start display at page:

Download "Title Language Model for Information Retrieval"

Transcription

1 Ttle Language Model for Informaton Retreval Rong Jn Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty Alex G. Hauptmann Computer Scence Department School of Computer Scence Carnege Mellon Unversty ChengXang Zha Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty ABSTRACT In ths paper, we propose a new language model, namely, a ttle language model, for nformaton retreval. Dfferent from the tratonal language model used for retreval, we defne the contonal probablty Q as the probablty of usng query Q as the ttle for document D. We adopted the statstcal translaton model learned from the ttle and document pars n the collecton to compute the probablty Q. To avod the sparse data problem, we propose two new smoothng methods. In the experments wth four fferent TREC document collectons, the ttle language model for nformaton retreval wth the new smoothng method outperforms both the tratonal language model and the vector space model for IR sgnfcantly. Categores and Subject Descrptors H.3.3 [Informaton Search and Retreval]: Retreval Models language model; machne learnng for IR General Terms Algorthms Keywords ttle language model, statstcal translaton model, smoothng, machne learnng. ITRODUCTIO Usng language models for nformaton retreval has been stued extensvely recently [,3,7,8,0]. The basc dea s to compute the contonal probablty Q,.e. the probablty of generatng a query Q gven the observaton of a document D. Several fferent methods have been appled to compute ths contonal probablty. In most approaches, the computaton s conceptually decomposed nto two stnct steps: () Estmatng a document language model; (2) Computng the query lkelhood usng the estmated document model based on some query model. For example, Ponte and Croft [8] emphaszed the frst step, and used several heurstcs to smooth the Maxmum Lkelhood Estmate Permsson to make gtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or strbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to restrbute to lsts, requres pror specfc permsson and/or a fee. SIGIR 02, August -5, 2002, Tempere, Fnland. Copyrght 2002 ACM /02/0008 $5.00. (MLE) of the document language model, and assumed that the query s generated under a multvarate Bernoull model. The BB method [7] emphaszed the second step and used a two-state hdden Markov model as the bass for generatng queres, whch, n effect, s to smooth the MLE wth lnear nterpolaton, a strategy also adopted n Hemstra and Kraaj [3]. In Zha and Lafferty [], t has been found that the retreval performance s affected by both the estmaton accuracy of document language models and the approprate modelng of the query, and a twostage smoothng method was suggested to explctly address these two stnct steps. A common defcency n these approaches s that they all apply an estmated document language model rectly to generatng queres, but presumably queres and documents should be generated through fferent stochastc processes, snce they have qute fferent characterstcs. Therefore, there exsts a gap between a document language model and a query language model. Indeed, such a gap has been well-recognzed n [4], where separate models are proposed to model queres and documents respectvely. The gap has also been recognzed n [6], where a document model s estmated based on a query through averagng over document models based on how well they explan the query. In most exstng approaches usng query lkelhood for scorng, ths gap has been mplctly addressed through smoothng. Indeed, n [] t has been found that the optmal settng of smoothng parameters s actually query-dependent, whch suggests that smoothng may have helped brdge ths gap. Although fllng the gap by smple smoothng has been shown to be emprcally effectve, deally we should estmate a query language model rectly based on the observaton of a document, and apply the estmated query language model, nstead of the document language model, to generate queres. The queston then s, What evdence do we have for estmatng a query language model gven a document?. Ths s a very challengng queston, snce the nformaton avalable to us n a typcal ad hoc retreval settng ncludes no more than a database of documents and queres. In ths paper, we propose to use the ttles of documents as the evdence for estmatng a query language model for a gven document -- essentally to approxmate the query language model gven a document by the ttle language model for that document, whch s easer to estmate. The motvaton of ths work s based on the observaton that queres are more lke ttles than documents n many aspects. For example, both ttles and queres tend to be very short and concse descrpton of nformaton. The reasonng process n author s mnd when makng up the ttle for a document s smlar to what s n a user s mnd when formulatng a query

2 based on some deal document -- both would be tryng to capture what the document s about. Therefore, t s reasonable to assume that the ttles and queres are created through a smlar generaton process. The ttle nformaton has been exploted prevously for mprovng nformaton retreval, but, so far, only heurstc methods, such as ncreasng the weght of ttle words have been tred (e.g., [5,0]). Here we use the ttle nformaton n a more prncpled way by treatng a ttle as an observaton from a document-ttle statstcal translaton model. Techncally, the ttle language model approach falls nto the general source-channel framework proposed n Berger and Lafferty [], where the fference between a query and a document s explctly addressed by treatng query formulaton as a corrupton of the deal document n the nformaton theoretc sense. Conceptually, however, the ttle language model s fferent from the synthetc query translaton model explored n []. The use of syntheszed queres provdes an nterestng way to tran a statstcal translaton model that can address mportant ssues such as synonymy and polysemy, whereas the ttle language model s meant to rectly approxmate queres wth ttles. Moreover, tranng wth the ttles poses specal ffcultes due to data sparseness, whch we scuss below. A document can potentally have many fferent ttles, but the author only provdes one ttle for each document. Thus, f we estmate ttle language models only based on the observaton of the author-gven ttles, t wll suffer severely from the problem of sparse data. The use of a statstcal translaton model can allevate ths problem. The basc dea s to treat the document-ttle pars as translaton pars observed from some translaton model that captures the ntrnsc document to query translaton patterns. Ths means, we would tran the statstcal translaton model based on the document-ttle pars n the whole collecton. Once we have ths general translaton model n hand, we can estmate the ttle language model for a partcular document by applyng the learned translaton model to the document. Even f we pool all the document-ttle pars together, the tranng data s stll qute sparse gven the large number of parameters nvolved. Snce ttles are typcally much shorter than documents, we would expect that most words n a document would never occur n any of the ttles n the collecton. To address ths problem, we extend the standard learnng algorthms of the translaton models by adng specal parameters to model the self-translaton probabltes of words. We propose two such technques: One assumes that all words have the same selftranslaton probablty and the other assumes that each ttle has an extra unobserved null word slot that can only be flled by a word generated through self-translaton. The proposed ttle language model and the two self-translaton smoothng methods are evaluated wth four fferent TREC databases. The results show that the ttle language model approach consstently performs better than both the smple language modelng approach and the Okap retreval functon. We also observe that the smoothng of self-translaton probabltes has a sgnfcant mpact on the retreval performance. Both smoothng methods mprove the performance sgnfcantly over the non-smoothed verson of the ttle language model. The null word based smoothng method consstently performs better than the method of tyng self-translaton probabltes. The rest of the paper s organzed as follows: We frst present the ttle language model approach n Secton 2, descrbng the two self-translaton smoothng methods. We then present the experments and results n Secton 3. Secton 4 gves the conclusons and future work. 2. A TITLE LAGUAGE MODEL FOR IR The basc dea of the ttle language model approach s to estmate the ttle language model for a document and then to compute the lkelhood that the query would have been generated from the estmated model. Therefore, the key ssue s how to estmate the ttle language model for a document based on the observaton of a collecton of documents. A smple approach would be to estmate the ttle language model for a document usng only the ttle of that document. However, because of the flexblty n choosng fferent ttles and the fact that each document has only one ttle gven by the author(s), t would be almost mpossble to obtan a good estmaton of ttle language model rectly from the ttles. Our approach s to explot statstcal translaton models to fnd the ttle language model based on the observaton of a document. More specfcally, we use a statstcal translaton model to convert the language model of a document to the ttle language model for that document. To accomplsh ths converson process, we need to answer two questons:. How to estmate such a statstcal translaton model? 2. How to apply the estmated statstcal translaton model to convert a document language model to a ttle language model and use the estmated ttle language model to score documents wth respect to a query? Sectons 2. and 2.2 address these two questons respectvely. 2. Learnng a Statstcal Ttle Translaton Model The key component n a statstcal ttle translaton model s the word translaton probablty dw),.e. the probablty of usng word tw n the ttle, gven that word dw appears n the document. Once we have the set of word translaton probabltes dw), we can easly calculate the ttle language model for a document based on the observaton of that document. To learn the set of word translaton probabltes, we can take advantage of the document-ttle pars n the collecton. By vewng documents as samples of a verbose language and ttles as samples of a concse language, we can treat each documentttle par as a translaton par,.e. a par of texts wrtten n the verbose language and the concse language respectvely. Formally, let {<t, d >, =, 2,, } be the ttle-document pars n the collecton. Accorng to the standard statstcal translaton model [2], we can fnd the optmal model M* by maxmzng the probablty of generatng ttles from documents, or M* = arg max t, M ) () M = Based on the model for the statstcal translaton model [2], Equaton () can be expanded as

3 M* = argmax argmax M argmax M M = ε φ, + d + = tw t dw = tw t dw t d, φ, + d + d c( d d ) d dw ) where ε s a constant, φ stands for the null word, d s the length of document d, c(d d ) s the number of tmes that word dw appears n document d. In the last step of Equaton (2), we throw out the constant ε and use the approxmaton that P ( dw d) c( d d) /( d + ). To fnd the optmal word translaton probabltes d M*), we can use the EM algorthm. The detals of the algorthm can be found n the lterature for statstcal translaton models, such as [2]. We call ths model model for easy reference. 2.. The problem of under-estmatng selftranslaton probabltes There s a serous problem wth usng model descrbed above rectly to learn the correlaton between the words n documents and ttles. In partcular, the self-translaton probablty of a word (.e., w =w w)) wll be under-estmated sgnfcantly. A document can potentally have many fferent ttles, but authors generally only gve one ttle for every document. Because ttles are usually much shorter than documents, only an extremely small porton of the words n a document can be expected to actually appear n the ttle. We measured the vocabulary overlappng between ttles and documents on three fferent TREC collectons: A988), WSJ( ) and SJM(99), and found that, on average, only 5% of the words n a document also appear n ts ttle. Ths means that, most of the document words would never appear n any ttle, whch wll result n a zero selftranslaton probablty for most of the words. Therefore, f we follow the learnng algorthm for the statstcal translaton model rectly, the followng scenaro may occur: For some documents, even though they contan every sngle query word, the probablty Q can stll be very low due to the zero self-translaton probablty. In the followng subsectons, we propose two fferent learnng algorthms that can address ths problem. As wll be shown later, both algorthms mprove the retreval performance sgnfcantly over the model, ncatng that the proposed methods for modelng the self-translaton probabltes are effectve Tyng self-translaton probabltes (Model 2) One way to avod the problem of zero self translaton probablty s to te all the self translaton probabltes w =w w) wth a sngle parameter P self. Essentally, we assume that all the selftranslaton probabltes have approxmately the same value, and so can be replace wth a sngle parameter. Snce there are always some ttle words actually comng from the body of documents, the unfed self-translaton probablty P self wll not be zero. We call the corresponng model Model 2. (2) We can also apply the EM algorthm to estmate all the word translaton probabltes, nclung the smoothng parameter P self. The updatng Equatons are as follows: Let w w) and P self stand for the parameters obtaned from the prevous teraton, P (w w) and P self stand for the updated values of the parameters n the current teraton. Accorng to the EM algorthm, the updatng equaton for the self-translaton probablty P self, wll be ) t ) P self = Z self w ) + w w ) w, ) (3) w d ^w w where varable Z self s the normalzaton constant and s defned as w w ) t) w, ) + w w w ) + w w ) w, ) w ^w w Zself = ) t) (4) + w ) w w ) w, ) w d ^w w For those non-self-translaton probabltes,.e. w w w), the EM updatng equatons are dentcal to the ones used for the standard learnng algorthm of a statstcal translaton model except that n the normalzaton equatons, the self-translaton probablty should be replaced wth P self, or w P ( w w ) = P self (5) w w 2..3 Adng a ull Ttle Word Slot (Model 3) One problem wth tyng all the self-translaton probabltes for fferent words wth a sngle unfed self-translaton probablty s that we lose some nformaton about the relatve mportance of words. Specfcally, those words wth a hgher probablty n the ttles should have a hgher self-translaton probablty than those wth a lower probablty n the ttles. Tyng them would cause under-estmaton of the former and over-estmaton of the latter. As a result, the self-translaton probablty may be less than the translaton probablty for other words, whch s not desrable. In ths subsecton, we propose a better smoothng model that s able to scrmnate the self-translaton probabltes for fferent document words. It s based on the dea of ntroducng an extra ULL word slot n the ttle. An nterestng property of ths model s that the self-translaton probablty s guaranteed to be no less than the translaton probablty for any other word,.e. w w) w w w). We call ths model Model 3. Ttles are typcally very short and therefore only provde us wth very lmted data. o suppose we had sampled more ttle words from the ttle language model of a gven document, what knds of words would we expect to have seen? Gven no other nformaton, t would be reasonable to assume that we wll more lkely observe a word that occurs n the document. To capture ths ntuton, we assume that there s an extra ULL, unobserved, word slot n each ttle, that can only be flled n by self-translatng any word n the body of the document. Use e t to stand for the extra word slot n

4 the ttle t. Wth the count of ths extra word slot, the standard statstcal translaton model between the document d and ttle t wll be mofed as P ( t d, M ) P ( e d, M ) P ( tw d, M ) tw t dw d P ( tw φ, M ) + d + t dw d tw t P ( dw dw, M ) P ( dw d ) P ( tw dw, M ) P ( dw d ) To fnd the optmal statstcal translaton model, we wll stll maxmze the translaton probablty from documents to ttles. Substtutng the document-ttle translaton probablty t d, wth equaton (6), the optmzaton goal (Equaton ()) can be wrtten as dw d dw ) dw d M* = argmax M = φ, + P tw dw M P dw d (7) (, ) ( ) + tw t d dw Because the extra word slot n every ttle provdes a chance for any word n the document to appear n the ttle through the selftranslaton process, t s not ffcult to prove that, ths model wll ensure that the self-translaton probablty w w) wll be no less than w w w) for any word w. The EM algorthm can agan be appled to maxmze Equaton (7) and learn the word translaton probabltes. The updatng equatons for the word translaton probabltes are essentally the same as what are used for the standard learnng algorthm for statstcal translaton models, except for the ncluson of the extra counts due to the null word slot. 2.2 Computng Document Query Smlarty In ths secton, we scuss how to apply the learned statstcal translaton model to fnd the ttle language model for a document and use the estmated ttle language model to compute the relevance value of a document wth respect to a query. To accomplsh ths, we defne the contonal probablty Q as the probablty of usng query Q as the ttle for document D, or, the probablty of translatng document D nto query Q usng the statstcal ttle translaton model, whch s gven below. Q D, M ) = ε qw d + φ, M ) + qw d M ) c( d dw d qw φ, M ) ε + qw d M ) dw D + dw D As can be seen from Equaton (8), the document language model dw s not rectly used to compute the probablty of a query term. Instead, t s converted nto a ttle language model through usng word translaton probabltes qw dw). Such converson also happens n the model proposed n [], but there the translaton model s meant to capture synonym and polysemy relatons, and s traned wth synthetc queres. Smlar to the (6) (8) tratonal language modelng approach, to deal wth the query words that can t be generated from ttle language model, we need to do further smoothng,.e. Q D, M ) = λε qw φ, M ) + qw d M ) c( d + d + dw d ( λ) qw GE) qw φ, M ) λ + qw d M ) dw + ε D + dw D ( λ) qw GE) (8 ) where constant λ s the smoothng constant and qw GE) s the general Englsh language model whch can be easly estmated from the collecton []. In our experment, we set the smoothng constant λ to be 0.5 for all fferent models and all fferent collectons. Equaton (8 ) s the general formula that can be used to score a document wth respect to a query wth any specfc translaton model. A fferent translaton model would thus result n a fferent retreval formula. In the next secton, we wll compare the retreval performance usng fferent statstcal ttle translaton models, nclung Model, Model 2 and Model EXPERIMET 3. Experment Desgn The goal of our experments s to answer the followng three questons:. Wll the ttle language model be effectve for nformaton retreval? To answer ths queston, we wll compare the performance of ttle language model wth that of the state-ofart nformaton retreval methods, nclung the Okap method and the tratonal language model for nformaton retreval. 2. How general s the traned statstcal ttle translaton model? Can a model estmated on one collecton be appled to another? To answer ths queston, we conduct an experment that apples the statstcal ttle translaton model learned from one collecton to other collectons. We then compare the performance of usng a foregn translaton model wth that of usng no translaton model. 3. How mportant s the smoothng of self-translaton n the ttle language model approach for nformaton retreval? To answer ths queston, we can compare the results for ttle language model wth model 2 and model 3. We used three fferent TREC testng collectons for evaluaton: AP88 (Assocated Press, 988), WSJ90-92 (wall street journal from 990 to 992) and SJM (San Jose Mercury ews, 99). We used TREC4 queres (20-250) and ther relevance judgments for evaluaton. The average length of the ttles n these collectons s four to fve words. The fferent characterstcs of the three databases allow us to check the robustness of our models. 4.2 Baselne Methods

5 The two baselne methods are the Okap method[9] and the tratonal language modelng approach. The exact formula for the Okap method s shown n Equaton (9) df ( qw) tf ( q log( ) df ( qw) Sm Q, = D tf ( q avg _ dl ( (9) where tf(q s the term frequency of word qw n document D, df(qw) s the document frequency for the word qw and avg_dl s the average document length for all the documents n the collecton. The exact equaton used for the tratonal language modelng approach s shown n Equaton (0). P ( Q = (( λ ) qw GE) + λ dw ) (0) The constant λ s the smoothng constant (smlar to the λ n Equaton (8 )), and qw GE) s the general Englsh language model estmated from the collecton. To make the comparson far, the smoothng constant for the tratonal language model s set to be 0.5, whch s same as for the ttle language model. 3.2 Experment Results The results on AP88, WSJ and SJM are shown n Table, Table 2, and Table 3, respectvely. In each table, we nclude the precsons at fferent recall ponts and the average precson. Several nterestng observatons can be made on these results: Table : Results for AP88 Collecton LM stands for tratonal language model, Okap stands for Okap formula and model-, model-2 and model-3 stand for ttle language model, model 2 and model 3. Collecton LM Okap Model Model Model 2 3 Recall Recall Recall Recall Recall Recall Recall Recall Recall Recall Avg. Prec Frst, let us compare the results between fferent ttle language models, namely model, model 2 and model 3. As seen from Table, 2 and 3, for all the three collectons, model s nferor to model 2, whch s nferor to model 3, n terms of both average precson and precsons at fferent recall ponts. In partcular, on the WSJ collecton, ttle language model performs extremely poorly compared wth the other two methods. Ths result ncates that ttle language model may fal to fnd relevant documents n some cases due to the problem of zero self-translaton probablty, as we scussed n Secton 2. Indeed, we computed the percentage of ttle words that cannot be found n ther documents. Ths number s 25% for AP88 collecton, 34% for SJM collecton and 45% for WSJ collecton. Ths hgh percentage of mssng ttle words strongly suggests that the smoothng of self-translaton probablty wll be crtcal. Indeed, for the WSJ collecton, whch has the hghest percentage of mssng ttle words, ttle language model, wthout any smoothng of self-translaton probablty, degrades the performance more dramatcally than for collectons AP88 and SJM, where more ttle words can be found n the documents, and the smoothng of self-translaton probablty s not as crtcal. Table 2: Results for WSJ collecton. LM stands for tratonal language model, Okap stands for Okap formula and model-, model-2 and model-3 stand for ttle language model, model 2 and model 3. Collecton LM Okap Model Model Model 2 3 Recall Recall Recall Recall Recall Recall Recall Recall Recall Recall Avg. Prec Table 3: Results for SJM Collecton. LM stands for tratonal language model, Okap stands for Okap formula and model-, model-2 and model-3 stand for ttle language model, model 2 and model 3. Collecton LM Okap Model Model Model 2 3 Recall Recall Recall Recall Recall Recall Recall Recall Recall Recall Avg. Prec The second menson of comparson s to compare ttle language models wth tratonal language model. As already ponted out by Berger and Lafferty [], the tratonal language model can be

6 vewed as a specal case of translaton language model,.e. all the translaton probablty w w) become delta functons δ(w ). Therefore, the comparson along ths menson can ncate f the translaton probabltes learned from the correlaton between ttles and documents are effectve n mprovng retreval accuracy. As seen from Table, Table 2, and Table 3, ttle language model 3 performances sgnfcantly better than the tratonal language model over all the three collectons n terms of all the performance measures. Thus, we can conclude that the translaton probablty learned from ttle-document pars appears to be helpful for fnng relevant documents. Lastly, we can also compare the performance of the ttle language model approach wth the Okap method [8]. For all the three collectons the ttle language model 3 outperforms Okap sgnfcantly n terms of all the performance measures, except n one case -- The precson at 0. recall on the WSJ collecton s slghtly worse than both the tratonal language model approach and Okap. To test the generalty of the estmated translaton model, we appled the statstcal ttle translaton model leaned from the AP88 collecton to the AP90 collecton. We hypothesze that, f two collectons are smlar, the statstcal ttle translaton model learned from one collecton should be able to gve a good approxmaton of the correlaton between documents and ttles of the other collecton. Therefore, t would make sense to apply the translaton model learned from one collecton to another smlar collecton. Table 4: Results for AP90. LM stands for tratonal language model, Okap stands for Okap formula and model-3 stand for ttle language model 3. Dfferent from the prevous experments n whch the translaton model s learned from the retreved collecton tself, ths experment apples the translaton model learned from AP88 to retreve relevant document n AP90 collecton. Collecton LM Okap Model3 Recall Recall Recall Recall Recall Recall Recall Recall Recall Recall Avg. Prec Table 4 gves the results of applyng the translaton model learned from AP88 to AP90. Snce ttle language model 3 already demonstrated ts superorty to model and model 2, we only consdered model 3 n ths experment. From Table 3, we see that ttle generaton model 3 outperforms the tratonal language model and Okap method sgnfcantly n terms of all measures. We also appled the statstcal ttle translaton model learned from AP88 to WSJ to further examne the generalty of the model and our learnng method. Ths tme, the performance of ttle language model 3 wth the statstcal ttle translaton model learned from AP88 s only about the same as the tratonal language model and Okap method for the collecton WSJ. Snce the statstcal ttle translaton model learned from AP88 can be expected to be a much better approxmaton of the correlaton between documents and ttles for AP90 than for WSJ, these results suggest that applyng the translaton model learned from a foregn database s helpful only when the foregn database s smlar to the natve one. But, t s nterestng to note that t has never resulted n any degradaton of performance. 4. COCLUSIOS Brdgng the gap between a query language model and document language model s an mportant ssue when applyng language models to nformaton retreval. In ths paper, we propose brdgng ths gap by explotng document ttles to estmate a ttle language model, whch can be regarded as an approxmate query language model. The essence of our work s to approxmate the query language model for a document wth the ttle language model for the document. Operatonally, we frst estmate such a translaton model by usng all the document-ttle pars n a collecton. The translaton model can then be used to convert a regular document language model to a ttle language model. Fnally, the ttle language model estmated for each document s used to compute the query lkelhood. Intutvely, the scorng s based on the lkelhood that the query could have been a ttle for a document. Based on the experment results, we can draw the followng conclusons: Based on the comparson between the ttle language models and the tratonal language model and the Okap method, we can conclude that the ttle language model for nformaton retreval s an effectve retreval method. In all our experments, the ttle language model gves a better performance than both the tratonal language model and the Okap method. Based on the comparson between three fferent ttle language models for nformaton retreval, we can conclude that ttle generaton model 2 and 3 are superor to model, and model 3 s superor to model 2. Snce the fference between the three fferent ttle language models s on how to handle the self-translaton probablty, we can conclude that, frst, t s crucal to smooth the self-translaton probablty to avod the zero self-translaton probablty. Second, a better smoothng method for self-translaton probablty can mprove the performance. Results show that adng an extra null word slot to the ttle s a reasonable smoothng method for the self-translaton probabltes. The success of applyng the ttle language model learned from AP88 to AP90 appears to ncate that, n the case when the two collectons are smlar, the correlaton between documents and ttles n one collecton also tend to be smlar to that n the other. Therefore, t would seem to be approprate to apply the statstcal ttle translaton model

7 learned from one collecton to the retreval task of another smlar collecton. Even f the collectons are not smlar, applyng a learned statstcal ttle translaton model from a foregn database does not seem to degrade the performance ether. Thus, the statstcal ttle translaton model learned from ttle-document pars may be used as a general resource that can be appled to retreval task for fferent collectons. There are several rectons for the future work. Frst, t would be nterestng to see how the style or qualty of ttles would affect the effectveness of our model. One possblty s to use the collectons where the qualty of ttles has hgh varances (e.g., the Web data). Second, we have assumed that queres and ttles are smlar, but there may be queres (e.g., long and verbose queres) that are qute fferent from ttles. So, t would be nterestng to further evaluate the robustness of our model by usng many fferent types of queres. Fnally, usng ttle nformaton s only one way to brdge the query-document gap; t would be very nterestng to further explore other effectve methods that can generate an approprate query language model for a document. 5. ACKOWLEDGEMETS We thank Jame Callan, Ymng Yang, Luo S, and the anonymous revewers for ther helpful comments on ths work. Ths materal s based n part on work supported by atonal Scence Foundaton under Cooperatve Agreement o. IRI Partal support for ths work was provded by the atonal Scence Foundaton's atonal Scence, Mathematcs, Engneerng, and Technology Educaton Dgtal Lbrary Program under grant DUE Ths work was also supported n part by the Advanced Research and Development Actvty (ARDA) under contract number MDA C Any opnons, fnngs, and conclusons or recommendatons expressed n ths materal are those of the authors and do not necessarly reflect the vews of the atonal Scence Foundaton or ARDA. 6. REFERECES [] A. Berger and J. Laffety (999). Informaton retreval as statstcal translaton. In Proceengs of SIGIR 99. pp [2] P. Brown, S. Della Petra, V. Della Petra, and R. Mercer (993). The mathematcs of statstcal machne translaton: Parameter estmaton. Computatonal Lngustcs, 9(2), pp [3] D. Hemstra and W. Kraaj (999), Twenty-One at TREC-7: ad-hoc and cross-language track, In Proceengs of the seventh Text Retreval Conference TREC-7, IST Specal Publcaton , pages , 999. [4] J. Lafferty and C. Zha (200), Document language models, query models, and rsk mnmzaton for nformaton retreval, In Proceengs of SIGIR 200, pp. -9. [5] A. M. Lam-Adesna, G. J. F. Jones, Applyng summarzaton technques for term selecton n relevance feedback, In Proceengs of SIGIR 200, pp. -9. [6] V. Lavrenko and W. B. Croft (200), Relevance-based Language Models, In Proceengs of SIGIR 200, pp [7] D. Mller, T. Leek and R. M. Schwartz (999). A hdden Markov model nformaton retreval system. Proceengs of SIGIR 999, pp [8] J. Ponte and W. B. Croft (998). A language modelng approach to nformaton retreval. In Proceengs of SIGIR 998, pp [9] S.E. Robertson et al.(993). Okap at TREC-4. In The Fourth Text Retreval Conference (TREC-4). 993 [0] E. Voorhees and D. Harman (ed.) (996), The Ffth Text REtreval Conference (TREC-5), IST Specal Publcaton [] C. Zha and J. Lafferty (200). A study of smoothng methods for language models appled to ad hoc nformaton retreval. In Proceeng of SIGIR 0, 200, pp

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Semantic Link Analysis for Finding Answer Experts *

Semantic Link Analysis for Finding Answer Experts * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Learning the Best K-th Channel for QoS Provisioning in Cognitive Networks

Learning the Best K-th Channel for QoS Provisioning in Cognitive Networks 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

DEGREES OF EQUIVALENCE IN A KEY COMPARISON 1 Thang H. L., Nguyen D. D. Vietnam Metrology Institute, Address: 8 Hoang Quoc Viet, Hanoi, Vietnam

DEGREES OF EQUIVALENCE IN A KEY COMPARISON 1 Thang H. L., Nguyen D. D. Vietnam Metrology Institute, Address: 8 Hoang Quoc Viet, Hanoi, Vietnam DEGREES OF EQUIVALECE I A EY COMPARISO Thang H. L., guyen D. D. Vetnam Metrology Insttute, Aress: 8 Hoang Quoc Vet, Hano, Vetnam Abstract: In an nterlaboratory key comparson, a ata analyss proceure for

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME August 7 - August 12, 2006 n Baden-Baden, Germany SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME Vladmr Šmovć 1, and Vladmr Šmovć 2, PhD 1 Faculty of Electrcal Engneerng and Computng, Unska 3, 10000

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models ActveClean: Interactve Data Cleanng Whle Learnng Convex Loss Models Sanjay Krshnan, Jannan Wang, Eugene Wu, Mchael J. Frankln, Ken Goldberg UC Berkeley, Columba Unversty {sanjaykrshnan, jnwang, frankln,

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo. ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) yaoq.feng@yahoo.com Abstract

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

Multi-Period Resource Allocation for Estimating Project Costs in Competitive Bidding

Multi-Period Resource Allocation for Estimating Project Costs in Competitive Bidding Department of Industral Engneerng and Management Techncall Report No. 2014-6 Mult-Perod Resource Allocaton for Estmatng Project Costs n Compettve dng Yuch Takano, Nobuak Ish, and Masaak Murak September,

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising* Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem Usng Mult-obectve Metaheurstcs to Solve the Software Proect Schedulng Problem Francsco Chcano Unversty of Málaga, Span chcano@lcc.uma.es Francsco Luna Unversty of Málaga, Span flv@lcc.uma.es Enrque Alba

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio Vascek s Model of Dstrbuton of Losses n a Large, Homogeneous Portfolo Stephen M Schaefer London Busness School Credt Rsk Electve Summer 2012 Vascek s Model Important method for calculatng dstrbuton of

More information

Estimating the Development Effort of Web Projects in Chile

Estimating the Development Effort of Web Projects in Chile Estmatng the Development Effort of Web Projects n Chle Sergo F. Ochoa Computer Scences Department Unversty of Chle (56 2) 678-4364 sochoa@dcc.uchle.cl M. Cecla Bastarrca Computer Scences Department Unversty

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

Simple Interest Loans (Section 5.1) :

Simple Interest Loans (Section 5.1) : Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part

More information

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS Yunhong Xu, Faculty of Management and Economcs, Kunmng Unversty of Scence and Technology,

More information

Prediction of Disability Frequencies in Life Insurance

Prediction of Disability Frequencies in Life Insurance Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng Fran Weber Maro V. Wüthrch October 28, 2011 Abstract For the predcton of dsablty frequences, not only the observed, but also the ncurred but

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Predicting Software Development Project Outcomes *

Predicting Software Development Project Outcomes * Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA 19104

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

How To Find The Dsablty Frequency Of A Clam

How To Find The Dsablty Frequency Of A Clam 1 Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng 1, Fran Weber 1, Maro V. Wüthrch 2 Abstract: For the predcton of dsablty frequences, not only the observed, but also the ncurred but not yet

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions In Proc. IEEE Internatonal Conference on Advanced Vdeo and Sgnal based Survellance (AVSS 05), September 2005 Hallucnatng Multple Occluded CCTV Face Images of Dfferent Resolutons Ku Ja Shaogang Gong Computer

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Dynamic Pricing for Smart Grid with Reinforcement Learning

Dynamic Pricing for Smart Grid with Reinforcement Learning Dynamc Prcng for Smart Grd wth Renforcement Learnng Byung-Gook Km, Yu Zhang, Mhaela van der Schaar, and Jang-Won Lee Samsung Electroncs, Suwon, Korea Department of Electrcal Engneerng, UCLA, Los Angeles,

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil *

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil * Evaluatng the Effects of FUNDEF on Wages and Test Scores n Brazl * Naérco Menezes-Flho Elane Pazello Unversty of São Paulo Abstract In ths paper we nvestgate the effects of the 1998 reform n the fundng

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Ring structure of splines on triangulations

Ring structure of splines on triangulations www.oeaw.ac.at Rng structure of splnes on trangulatons N. Vllamzar RICAM-Report 2014-48 www.rcam.oeaw.ac.at RING STRUCTURE OF SPLINES ON TRIANGULATIONS NELLY VILLAMIZAR Introducton For a trangulated regon

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection Stochastc Protocol Modelng for Anomaly Based Network Intruson Detecton Juan M. Estevez-Tapador, Pedro Garca-Teodoro, and Jesus E. Daz-Verdejo Department of Electroncs and Computer Technology Unversty of

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

Blind Estimation of Transmit Power in Wireless Networks

Blind Estimation of Transmit Power in Wireless Networks Bln Estmaton of Transmt Power n Wreless Networks Murtaza Zafer (IBM Research), Bongjun Ko (IBM Research), Chatschk Bskan (IBM Research) an Ivan Ho (Imperal College, UK) Transmt-power Estmaton: Problem

More information

Product Quality and Safety Incident Information Tracking Based on Web

Product Quality and Safety Incident Information Tracking Based on Web Product Qualty and Safety Incdent Informaton Trackng Based on Web News 1 Yuexang Yang, 2 Correspondng Author Yyang Wang, 2 Shan Yu, 2 Jng Q, 1 Hual Ca 1 Chna Natonal Insttute of Standardzaton, Beng 100088,

More information