Identifying Citing Sentences in Research Papers Using Supervised Learning

Size: px
Start display at page:

Download "Identifying Citing Sentences in Research Papers Using Supervised Learning"

Transcription

1 Identfyng Ctng Sentences n Research Papers Usng Supervsed Learnng Kazunar Sugyama Tarun Kumar Mn-Yen Kan Ramesh C. Trpath Department of Computer Scence Indan Insttute of Department of Computer Scence Indan Insttute of Natonal Unversty of Sngapore Informaton Technology Natonal Unversty of Sngapore Informaton Technology Sngapore Allahabad, Inda Sngapore Allahabad, Inda sugyama@comp.nus.edu.sg tar.ta@gmal.com kanmy@comp.nus.edu.sg rctrpath@ta.ac.n Abstract Researchers have largely focused on analyzng ctaton lnks from one scholarly work to another. Such ctng sentences are an mportant part of the narratve n a research artcle. If we can automatcally dentfy such sentences, we can devse an edtor that helps suggest when a partcular pece of text needs to be backed up wth a ctaton or not. In ths paper, we propose a method for dentfyng ctng sentences by constructng a classfer usng supervsed learnng. Our experments show that smple language features such as proper nouns and the labels of prevous and next sentences are effectve features to dentfyng ctng sentences. Keywords nformaton retreval; dgtal lbrary; dscourse processng; ctaton analyss I. INTRODUCTION When we wrte research papers or artcles, we often make references to prevous works n our own research feld. Ctatons serve varous purposes: as evdence for clams, as an acknowledgment of other s work, among other functons. We term sentences that contan such references as ctng sentences. These statements are usually followed by a ponter to the full reference located at the end of a paper n a Reference or Bblography secton. Our am n ths work s to dentfy whether a sentence n a paper needs a ctaton or not. For example, consder the followng two sentences: Sentence 1: We want to buld a system whch can help n fndng a person s job easly. Sentence 2: The HSQL project was led by the Swedsh State Bureau wth partcpants from Sweden, Denmark, Fnland, and Norway. In the above two statements, Sentence 2 needs a ctaton because t refers to the prevous work, namely, the HSQL project. On the other hand, most people would agree that Sentence 1 does not need a ctaton because there s no descrpton of prevous work. In ths paper, we propose a method 1 for dentfyng sentences that requre ctatons such as Sentence 2 above. Whle ctng sentences are often trvally marked wth a ctaton marker (e.g., [5] or (Brown, 1990) ), an mportant dstncton n our problem s that we consder 1 Ths work s supported by a Meda Development Authorty (MDA) grant Interactve Meda Search, R detectng such a sentence when such markers are not present. We show that our approach, whch constructs a supervsed classfer from smple features, acheves a hgh level of accuracy. Wth respect to references and other works on the analyss of ctaton nformaton [1, 2], to the best of our knowledge, the work presented here s the frst work to dentfy ctng sentences usng natural language analyss. Such a module s useful n a smart authorng envronment, whch can help authors by suggestng whether statements made n the paper draft need ctatons or not. Such a system wll take n a research paper as nput and dentfy the statements that need ctaton. It can also be used at the tme of wrtng a paper by suggestng to authors whether current present statement needs a ctaton or not. Ths paper s organzed as follows: In Secton II, we revew related work on analyzng ctatons n research papers and brefly descrbe the two classfers we use n our experment. In Secton III, we descrbe our approach n dstngushng between sentence that requre ctatons and those that do not. In Secton IV, we present the expermental results for evaluatng our approach. Fnally, we conclude the paper wth a summary and drectons for future work n Secton V. II. RELATED WORK We frst revew related works on scholarly ctaton analyss, and then survey the fundamental background of two state-of-the-art classfers maxmum entropy and support vector machnes that we employ n later our experments. A. Ctaton Analyss of Research Papers Ctaton nformaton has been used for nformaton retreval snce the early stages of ths feld. As far as we know, there are two types of research n the feld of ctaton analyss of research papers, (1) ctaton count to evaluate the mpact of scentfc papers, and (2) ctaton context analyss. Ctaton count s wdely used n evaluatng the mportance of a paper because t s strongly correlated wth academc document mpact [3]. The Thomson Scentfc ISI Impact Factor (ISI IF) s the representatve approach usng ctaton count [4], whch factors ctaton count wth a movng wndow to calculate the mpact of certan publcaton venues. The advantages of ctaton count are () ts smplcty n computaton; and () ts proven track record n deployment n scentometrc applcatons. However, ctaton count has well-known lmtatons: Ctng papers wth hgh mpact and ones wth low mpact are treated equally n standard ctaton count.

2 In order to overcome ths problem, many works recently have employed the noton of PageRank [5] to better weght and control for the nfluence of papers of dfferng mpact [6, 7, 8, 9, 10]. Ctaton nformaton s statstcal n nature. Therefore, many researchers have focused on ths characterstc. For example, Kessler et al. [11] proposed to use the noton of bblographc couplng, where two documents are sad to be coupled f they share one or more references. Small [12] proposed a complementary method, termed coctaton analyss, where the smlarty between documents A and B s measured by the number of documents that cte A and B. In addton, researchers have also focused on the potental usefulness of the text assocated wth ctatons n specfc applcatons, such as text summarzaton [13, 14], thesaurus constructon [15], and nformaton retreval [16, 17]. B. Maxmum Entropy (ME) The framework of maxmum entropy [18] has already been wdely used for a varety of natural language tasks such as prepostonal phrase attachment [19], language modelng [20, 21], part-of-speech taggng [22] and text segmentaton [23]. Maxmum entropy has been shown to be an effectve and compettve algorthm n these domans. Statstcal modelng constructs a model that best accounts for some tranng data. Specfcally, for a gven emprcal probablty dstrbuton ~ p, a model p s bult to result n a dstrbuton as close to ~ p as possble. Gven a set of tranng data, there are numerous ways to choose a model p that accounts for the data. It can be shown that the probablty dstrbuton defned by Equaton (1) s the one that s closest to ~ p n the sense of Kullback-Lebler dvergence, when subjected to a set of feature constrants: k 1 P( y x) = exp (, ), (1) ( ) λ f x y Z x = 1 where p ( y x) denotes the condtonal probablty of predctng an class y on seeng the context x. f y) ( = 1,, k) are feature functons, λ ( = 1,, k) are the weghtng parameters for f y) ( = 1,, k). k s the number of features and Z(x) s a normalzaton factor to ensure that the p ( y x) scores sum to one and reflect true probabltes. Ths maxmum entropy model represents evdence wth bnary functons known as contextual predcates n the form: f ' y cp, y 1 ) = 0 If y=y and cp(x)=true otherwse where cp s the contextual predcate that maps a outcome y and context x par to {true, false}. The human expert can choose arbtrary feature functons n order to reflect the characterstcs of the problem doman as fathfully as possble. The ablty of freely ncorporatng problem-specfc knowledge n terms of feature functons gves ME models an advantage over other learnng paradgms, whch often suffer from strong feature ndependence assumptons (e.g., n the case of the naïve Bayes classfer). Once a set of features s chosen by the human expert, the correspondng maxmum entropy model can be constructed by addng features as constrants to the model and teratvely adjustng the weghts of these features automatcally to best reflect the tranng data. Formally, we requre that: E ~ p < f >= E p < f >, where E p < f >= ~ ~ p x y f x (, ) y) s the emprcal expectaton wth respect to the model dstrbuton p. Among all the models subjected to these constrants, a unque soluton exsts that preserves the uncertanty n the orgnal constrants and does not add any extra bas to the soluton ths s the maxmum entropy soluton obtaned by the tranng procedure. Gven an exponental model wth n features and a set of tranng data (emprcal dstrbuton), we need to fnd the assocated weght for each of the n features to maxmze the model s log-lkelhood: L ( p) = ~ p y)log p( y x). x, y It s mportant to select an optmal model subjected to gven constrants from the log-lnear famly. There are three well-known teratve scalng algorthms specally desgned to estmate parameters of ME models of the Equaton (1): Generalzed Iteratve Scalng [24] and Improved Iteratve Scalng [25], and Lmted Memory Varable Metrc [26]. C. Support Vector Machne (SVM) Support Vector Machne (SVM) [27] has many desrable qualtes that make t one of the most popular algorthms. It not only has a sold theoretcal foundaton, but also classfes more accurately than most other algorthms n many applcatons such as Web page classfcaton and bonformatcs tasks. Gven a tranng set of nstance label pars (x, y ) n (=1,, l) where x R s a tranng vector and y { 1, 1} l s ts class label, an SVM fnds a lnear separatng hyperplane wth the maxmal margn as a soluton to the followng optmzaton problem: 1 mn w w, b, ξ 2 T w + C ξ = 1 T subject to y ( w φ( x ) + b) 1 ξ, ξ 0. l As the orgnal problem may not be lnearly separable, x can be mapped nto a hgher dmensonal space by a functon φ. Then, SVM fnds a lnear separatng hyperplane wth the maxmal margn n ths hgher dmensonal space. C > 0 s the penalty parameter of the error term. Interestngly, T K( x, x j ) φ( x ) φ( x j ), the kernel functon, can be of dfferng forms lnear, polynomal, radal bass and, sgmod functons are often used. The SVM depends drectly on the kernel functon, and f a surrogate method that yelds the functon values can be gven, the explct Cartesan product need not be calculated, greatly savng computatonal complexty. (2)

3 Fgure 1. Overvew of our system. III. PROPOSED METHOD Fgure 1 llustrates our proposed system. Our proposed system conssts of the followng four parts. We detal each of these steps ndvdually. (1) Constructng proper tranng and test data sets, (2) Extractng the approprate features from the data, (3) Constructng the classfer, (4) Classfyng sentences as ctng or non-ctng. (1) Constructng proper tranng and test data sets For our experments, we utlze the Assocaton for Computatonal Lngustcs standard research artcle corpus, the ACL Anthology Reference Corpus (ACL ARC), dscussed n Secton IV. We frst remove Reference secton and stop words 2 from each paper. We defne a sentence that contans ctng nformaton as postve nstance, and a sentence that does not contan that as negatve nstance. Our dea s to use sentences that have exstng ctatons as tranng data, by removng the ctaton marker. If a ctaton marker s found va heurstc rules, we remove t. We then dvde our dataset nto ten equal szed parts, to be used as cross valdaton folds. As we perform 10-fold cross valdaton, we dvded the whole data set nto 90% of tranng data and 10% of test data and repeat the below expermental process 10 tmes to obtan our fnal evaluaton results. 2 Lst of 571 words obtaned from ftp://ftp.cs.cornell.edu/pub/smart/englsh.stop (2) Extractng the approprate features from the data In ths module, we extract features from each sentence n order to construct the classfer later. The features we extracted are as follows: a) Ungram - Ungram features nclude the words contaned n the sentences. After removng the stop words from the sentences, each word appearng n the sentence serves as a bnary feature, turned on for the ndvdual sentence. Ungrams are an mportant class of features as certan types of words tend to appear more frequently n ctng/non-ctng sentence. For example, the sentence, The classfcaton model s traned can be represented wth the ungrams classfcaton, model, traned ( the and s are stop words), among others. b) Bgram - By combnng two adjacent words n sentences, we create bgram features. Bgram features help n analyzng the effect of two ndependent words n combnaton. Ths combnaton helps sgnfcantly n classfcaton. Usng the same example as above, we extract the bgrams, classfcaton model, model traned (smlarly, the and s are stop words), among others. c) Proper Nouns - These are nouns that gve the names of people, locatons, systems and organzatons. Based on the presence of varous types of proper nouns, the respectve bnary features are set. Ths feature also

4 plays a sgnfcant role n detectng ctng sentences, as from experence, we know that such sentences often refer to partcular scholars, ther developed systems or nsttutons. d) Prevous and Next Sentence - We also can nclude nformaton about the classfcatons of neghborng sentences ther ctaton/non-ctaton status. For example, f the prevous sentence s a ctng sentence, the followng sentence may contnue to dscuss the same work and would be less lkely to contan an addtonal ctaton. e) Poston - The postonal feature gves nformaton about the part of document n whch the sentence appears. To mplement ths feature, we dvde the document nto sx equal parts: one part for the frst 1/6th of the document, one part for the second 1/6th, untl the fnal sxth 1/6th. We turn on one of the sx bnary features based on whch part of document the sentence appears n. These features are mportant as statements appearng n certan sectons exhbt markedly dfferent probabltes for ctaton. For example, sentences n the mddle or end of a research paper are lkely to dscuss the authors own work or the evaluaton and are less probable ctaton areas, as compaerd to the begnnng of the artcle, where authors often dscuss and credt pror work. f) Orthographc - Ths set of features check for mscellaneous formattng characterstcs, ncludng specfc orthographes used n the sentence. For example, a sentence contanng numbers or sngle captal letters may be more ndcatve of ctng sentences, as they may present comparatve results or author ntals from cted works. (3) Constructng the classfer Usng the tranng data set descrbed n (1), we construct two supervsed classfers based on two publcly avalable mplementatons of supervsed learnng frameworks: maxmum entropy (ME) 3 [18] and support vector machne (SVM) 4 [27] as descrbed n Secton II. Both methodologes were chosen as they effcently handle a large number of non-ndependent features, common to many natural language tasks. In addton, our task s a bnary classfcaton problem whether a sentence s ctng sentence or not. SVM often brngs superor results n bnary classfcaton tasks. (4) Classfyng sentences as ctng or non-ctng Gven traned classfers n (3), we can then apply the traned models to new, unseen sentences. We employ these traned models and assess the performance of the models usng accuracy as our evaluaton measure. IV. EXPERIMENTS A. Expermental Data We used the ACL Anthology Reference Corpus (ACL ARC) 5 [28]. The ACL ARC s constructed from a sgnfcant subset of the ACL Anthology 6, whch s a dgtal archve of conference and journal papers n natural language processng and computatonal lngustcs. The ACL ARC conssts of 10,921 artcles from the February 2007 snapshot of the ACL Anthology. Usng the ACL ARC, we extracted features from each of the 955,755 sentences ncluded n the 10,921 artcles. Usng regular expresson pattern matchng to fnd ctaton markers, we dentfed 112,533 sentences as ctng sentences (postve nstances), and post-processed them to remove the ctaton marker. We deemed the remanng 843,242 sentences as non-ctng sentences (negatve nstances). In our experments, the am s to dentfy whether each sentence s a ctng sentence or not. B. Expermental Results Usng each of the ndvdual feature classes from a) to f) descrbed n Secton III, we constructed both SVM and ME classfers. We evaluated our models usng smple accuracy defned as follows: (Number of correct classfcatons) Accuracy =, (Total number of test cases) where correct classfcatons means that the learned model predcts the same class as the orgnal class of the test case. 1) Classfcaton Accuracy by Cross Valdaton We frst conducted experments usng 10-fold cross valdaton for both ME and SVM. For the SVM experments, we used the default settngs of LIBSVM package, settng the kernel as the radal bass functon, and the value of C n Equaton (2) as 1.0. Table 1 shows expermental results obtaned usng classfers constructed both ME and SVM. For comparson, we also constructed an ntegrated classfer whch uses all of sx features, labeled as All n Table 1. Table 1. Accuracy obtaned by ME and SVM. Feature Accuracy (ME) Accuracy (SVM) (1) Ungram (2) Bgram (3) Proper Noun (4) Prevous and Next Sentence (5) Poston (6) Orthographc (7) All [(1) - (6)] Maxmum Entropy Modelng Toolkt (Verson ), 4 LIBSVM (Verson 2.89), 5 Verson , 6

5 Fgure 2. Classfcaton accuracy obtaned by ME. Fgure 4. Classfcaton accuracy obtaned by SVM wth dfferent value of C. Table 2. Optmal values of C and ther accuracy. Feature Optmal Value Accuracy of C (1) Ungram (2) Bgram (3) Proper Noun (4) Prevous and Next Sentence (5) Poston (6) Orthographc (7) All [(1) - (6)] Fgure 3. Classfcaton accuracy obtaned by SVM. 2) Classfcaton Accuracy on Dfferent Sze of Tranng Data For most of learnng algorthms, the sze of the tranng data affects the classfcaton accuracy. Therefore, we conducted experments to emprcally assess how classfcaton performance changes when the sze of tranng data s a subset of the full tranng data from 10% to 90%. Fgures 2 and 3 (both shown at the same scale) gve the expermental results obtaned by ME and SVM, respectvely. 3) Classfcaton Accuracy n Dfferent Value of C n SVM Fnally, we conducted a seres of experments n tunng the SVM performance. In the SVM framework, the value of C n Equaton (2) the error term denotes the tolerance of the SVM to accept msclassfcatons n the separatng hyperplane, also affects classfcaton accuracy. Thus, we conducted experments to fnd the classfcaton accuracy obtaned by dfferent values of C. Fgure 4 shows the accuracy obtaned by usng SVM wth dfferent values of C. C. Dscusson Accordng to Table 1, n the frst set of experments on cross valdaton, both ME and SVM acheved an accuracy greater than We observed a small dfference of accuracy (0.002 to 0.024) between these classfers. Therefore, we can fnd that the accuracy of ths knd of task does not depend on classfers. Especally, smple features such as Proper Noun and the context of Prevous and Next Sentence brng better results (0.882) among them. Interestngly, the Bgram feature s not so effectve among the features we used. As bgram features are often very sparse, t may be dffcult to construct an accurate classfer wth such few overlappng features. By varyng the sze of the tranng data, we obtan slght varatons n performance n both the SVM and ME frameworks. Accordng to Fgure 2, ME exhbts more performance varaton, where the accuracy of features such as Ungram, Bgram, and All s nfluenced by the sze of tranng data. In other words, the larger the sze of the tranng data, the more accurate the classfcaton results are. Accordng to Fgure 3, the SVM framework shows less varaton. When the data sze was slghtly reduced (80-90% of the orgnal), a slght mprovement n accuracy s

6 observed. For example, n Prevous and Next Sentence, whle the accuracy at 10% of tranng data s 0.872, the accuracy at 90% of tranng data s Moreover, n All, whle the accuracy at 10% of tranng data s 0.875, the accuracy at 90% of tranng data s In future work, we may nvestgate ths pecularty n more detal. Fnally, accordng to Fgure 4, when dfferent error term values of C n the SVM framework were used, we observed that the best accuracy s obtaned when the value of C s set to slghtly lower than 1.0 n each feature. The optmal values of C and ther accuracy obtaned by each feature are shown n Table 2. Together wth the frst expermental results, the results show that Proper Noun and the context of Prevous and Next Sentence features brng the best accuracy n both ME and SVM (0.882 wth C=0.9). Surprsngly, the composte classfers that use all feature classes underperform classfers that are traned only on these two sources of data. V. CONCLUSION We have descrbed a method for dentfyng ctaton sentences by constructng classfer usng supervsed learnng approaches wth smple features extracted from research papers. Expermental results showed that both proper nouns and contextual classfcaton of the prevous and next sentence are effectve features for tranng accurate models n both SVM and ME frameworks. In future work, we plan to buld an edtor that wll help authors wrte a research paper by advsng them when statements n ther draft need a ctaton or not. REFERENCES [1] S. Lawrence, C. L. Gles, and K. Bollacker: Dgtal Lbrares and Autonomous Ctaton Indexng, IEEE Computer, 32(6): 67-71, [2] I. G. Councll, C. L. Gles, and M.-Y. Kan: ParsCt: An Open- Source CRF Reference Strng Parsng Package, In Proc. of the 6th Internatonal Conference on Language Resources and Evaluaton Conference (LREC08), pages , [3] F. Narn: Evaluatve Bblometrcs: The Use of Publcaton and Ctaton Analyss n the Evaluaton of Scentfc Actvty, Computer Horzons, Cherry Hll, NJ, [4] E. Garfeld: Ctaton Indexng: Its Theory and Applcaton n Scence, Technology, and Humantes, John Wley and Sons, NY, [5] L. Page, S. Brn, R. Motwan, and T. Wnograd: The PageRank Ctaton Rankng: Brngng Order to the Web, Stanford Dgtal Lbrary Technologes Project, SIDL-WP , [6] J. Bollen, M. A. Rodrguez and H. Van De Sompel: Journal Status, Scentometrcs, 69(3): , [7] Y. Sun and C.L. Gles: Popularty Weghted Rankng for Academc Dgtal Lbrares, In Proc. of the 29th European Conference on Informaton Retreval (ECIR 2007), pages , [8] M. Krapvn and M. Marchese: Focused PageRank n Scentfc Papers Rankng, In Proc. of the 11th Internatonal Conference on Asan Dgtal Lbrares (ICADL 2008), Lecture Notes n Computer Scence (LNCS), Vol. 5362, pages , [9] N. Ma, J. Guan, and Y. Zhao: Brngng PageRank to the Ctaton Analyss, Informaton Processng and Management, 44(2), pages , [10] H. Sayyad and L. Getoor: FutureRank: Rankng Scentfc Artcles by Predctng ther Future PageRank, In Proc. of the 9th SIAM Internatonal Conference on Data Mnng, pages , [11] M. M. Kessler: Bblographc Couplng Between Scentfc Papers, Amercan Documentaton, 14(1): 10-25, [12] H. Small: Co-Ctaton n the Scentfc Lterature: A New Measure of the Relatonshp Between Two Documents, Journal of the Amercan Socety of Informaton Scence, 24(4): , [13] S. Teufel, A. Sddharthan, and D. Tdhar: Automatc Classfcaton of Ctaton Functon, In Proc. of the 2006 Conference on Emprcal Methods n Natural Language Processng (EMNLP 2006), pages , [14] V. Qazvnan and D.R. Radev: Scentfc Paper Summarzaton Usng Ctaton Summary Networks, In Proc. of the 22nd Internatonal Conference on Computatonal Lngustcs (Colng2008), pages , [15] J. Schneder: Verfcaton of Bblometrc Methods Applcablty for Thesaurus Constructon, PhD thess, Royal School of Lbrary and Informaton Studes, [16] A. Rtche, S. Teufel, and S. Robertson: Usng Terms from Ctatons for IR: Some Frst Results, In Proc. of the 29th European Conference on Informaton Retreval (ECIR 2007), pages , [17] A. Rtche, S. Robertson and S. Teufel: Comparng Ctaton Contexts for Informaton Retreval, In Proc. of the 17th Internatonal Conference on Informaton and Knowledge Management (CIKM'08), pages , [18] A. L. Berger, S. A. Della Petra, and V. J. Della Petra: A Maxmum Entropy Approach to Natural Language Processng, Computatonal Lngustcs, 22(1):39-71, [19] A. Ratnaparkh, J. Reynar, and S. Roukos: A Maxmum Entropy Model for Prepostonal Phrase Attachment, In Proc. of the ARPA Human Language Technology Workshop, pages , [20] R. Rosenfeld: Adaptve Statstcal Language Modelng: A Maxmum Entropy Apporach, PhD thess, Carnege Mellon Unversty, [21] S. F. Chen and R. Rosenfeld: A Gaussan Pror for Smoothng Maxmum Entropy Models, Techncal Report CMU-CS , Carnege Mellon Unversty, [22] A. Ratnaparkh: A Maxmum Entropy Model for Part-Of- Speech Taggng, In Proc. of the Conference on Emprcal Methods n Natural Language Processng, pages , [23] D. Beeferman, A. Berger, and J. Lafferty: Statstcal Models For Text Segmentaton, Machne Learnng, 34(1-3): , [24] J. N. Darroch and D. Ratclff: Generalzed Iteratve Scalng for Log-Lnear Models, The Annals of Mathematcal Statstcs, 43(5): , [25] S. D. Petra, V. J. Della Petra, and J. D. Lafferty: Inducng Features of Random Felds. IEEE Transactons on Pattern Analyss and Machne Intellgence, 19(4): , [26] R. Malouf: A Comparson of Algorthms for Maxmum Entropy Parameter Estmaton, In Proc. of the 6th Conference on Natural Language Learnng (CoNLL-2002). pages 49 55, [27] V. Vapnk: The Nature of Statstcal Learnng Theory, Sprnger, NY, [28] S. Brd, R. Dale, B. J. Dorr, B. Gbson, M. T. Joseph, M.-Y. Kan, D. Lee, B. Powley, D. R. Radev, Y. F. Tan: The ACL Anthology Reference Corpus: A Reference Dataset for Bblographc Research n Computatonal Lngustcs, In Proc. of the 6th Internatonal Conference on Language Resources and Evaluaton Conference (LREC08), pages , 2008.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger Enrchng the Knowledge Sources Used n a Maxmum Entropy Part-of-Speech Tagger Krstna Toutanova Dept of Computer Scence Gates Bldg 4A, 353 Serra Mall Stanford, CA 94305 9040, USA krstna@cs.stanford.edu Chrstopher

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Rank Based Clustering For Document Retrieval From Biomedical Databases

Rank Based Clustering For Document Retrieval From Biomedical Databases Jayanth Mancassamy et al /Internatonal Journal on Computer Scence and Engneerng Vol.1(2), 2009, 111-115 Rank Based Clusterng For Document Retreval From Bomedcal Databases Jayanth Mancassamy Department

More information

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Title Language Model for Information Retrieval

Title Language Model for Information Retrieval Ttle Language Model for Informaton Retreval Rong Jn Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty Alex G. Hauptmann Computer Scence Department School of Computer Scence

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Web Spam Detection Using Machine Learning in Specific Domain Features

Web Spam Detection Using Machine Learning in Specific Domain Features Journal of Informaton Assurance and Securty 3 (2008) 220-229 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features Hassan Najadat 1, Ismal Hmed 2 Department of Computer Informaton Systems Faculty

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

Predicting Software Development Project Outcomes *

Predicting Software Development Project Outcomes * Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA 19104

More information

Web Object Indexing Using Domain Knowledge *

Web Object Indexing Using Domain Knowledge * Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

Product Quality and Safety Incident Information Tracking Based on Web

Product Quality and Safety Incident Information Tracking Based on Web Product Qualty and Safety Incdent Informaton Trackng Based on Web News 1 Yuexang Yang, 2 Correspondng Author Yyang Wang, 2 Shan Yu, 2 Jng Q, 1 Hual Ca 1 Chna Natonal Insttute of Standardzaton, Beng 100088,

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Dynamic Resource Allocation for MapReduce with Partitioning Skew

Dynamic Resource Allocation for MapReduce with Partitioning Skew Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC.216.253286, IEEE Transactons

More information

Performance Management and Evaluation Research to University Students

Performance Management and Evaluation Research to University Students 631 A publcaton of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Edtors: Peyu Ren, Yancang L, Hupng Song Copyrght 2015, AIDIC Servz S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The Italan Assocaton

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Financial market forecasting using a two-step kernel learning method for the support vector regression

Financial market forecasting using a two-step kernel learning method for the support vector regression Ann Oper Res (2010) 174: 103 120 DOI 10.1007/s10479-008-0357-7 Fnancal market forecastng usng a two-step kernel learnng method for the support vector regresson L Wang J Zhu Publshed onlne: 28 May 2008

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

A PROBABILITY-MAPPING ALGORITHM FOR CALIBRATING THE POSTERIOR PROBABILITIES: A DIRECT MARKETING APPLICATION

A PROBABILITY-MAPPING ALGORITHM FOR CALIBRATING THE POSTERIOR PROBABILITIES: A DIRECT MARKETING APPLICATION Document de traval du LEM 2011-06 A PROBABILITY-MAPPIG ALGORITHM FOR CALIBRATIG THE POSTERIOR PROBABILITIES: A DIRECT MARKETIG APPLICATIO Krstof Coussement *, Wouter Bucknx ** * IESEG School of Management

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection Stochastc Protocol Modelng for Anomaly Based Network Intruson Detecton Juan M. Estevez-Tapador, Pedro Garca-Teodoro, and Jesus E. Daz-Verdejo Department of Electroncs and Computer Technology Unversty of

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

A heuristic task deployment approach for load balancing

A heuristic task deployment approach for load balancing Xu Gaochao, Dong Yunmeng, Fu Xaodog, Dng Yan, Lu Peng, Zhao Ja Abstract A heurstc task deployment approach for load balancng Gaochao Xu, Yunmeng Dong, Xaodong Fu, Yan Dng, Peng Lu, Ja Zhao * College of

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

An Inductive Fuzzy Classification Approach applied to Individual Marketing

An Inductive Fuzzy Classification Approach applied to Individual Marketing An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based

More information

Semantic Link Analysis for Finding Answer Experts *

Semantic Link Analysis for Finding Answer Experts * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG

More information

Active Learning for Interactive Visualization

Active Learning for Interactive Visualization Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,

More information

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data Computatonal Statstcs & Data Analyss 51 (26) 1643 1655 www.elsever.com/locate/csda Multclass sparse logstc regresson for classfcaton of multple cancer types usng gene expresson data Yongda Km a,, Sunghoon

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

SVM Tutorial: Classification, Regression, and Ranking

SVM Tutorial: Classification, Regression, and Ranking SVM Tutoral: Classfcaton, Regresson, and Rankng Hwanjo Yu and Sungchul Km 1 Introducton Support Vector Machnes(SVMs) have been extensvely researched n the data mnng and machne learnng communtes for the

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

How To Analyze News From A News Report

How To Analyze News From A News Report , pp. 385-396 http://dx.do.org/10.14257/jmue.2014.9.11.37 Topc Sentment Analyss n Chnese News Ouyang Chunpng, Zhou Wen +, Yu Yng, Lu Zhmng and Yang Xaohua School of Computer Scence and Technology, Unversty

More information

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract Support Vector Machne Model for Currency Crss Dscrmnaton Arndam Chaudhur Abstract Support Vector Machne (SVM) s powerful classfcaton technque based on the dea of structural rsk mnmzaton. Use of kernel

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology xc7@njt.edu Brook Wu New Jersey Insttute of Technology wu@njt.edu ABSTRACT Ths

More information

A machine vision approach for detecting and inspecting circular parts

A machine vision approach for detecting and inspecting circular parts A machne vson approach for detectng and nspectng crcular parts Du-Mng Tsa Machne Vson Lab. Department of Industral Engneerng and Management Yuan-Ze Unversty, Chung-L, Tawan, R.O.C. E-mal: edmtsa@saturn.yzu.edu.tw

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Planning for Marketing Campaigns

Planning for Marketing Campaigns Plannng for Marketng Campagns Qang Yang and Hong Cheng Department of Computer Scence Hong Kong Unversty of Scence and Technology Clearwater Bay, Kowloon, Hong Kong, Chna (qyang, csch)@cs.ust.hk Abstract

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Human behaviour analysis and event recognition at a point of sale

Human behaviour analysis and event recognition at a point of sale Human behavour analyss and event recognton at a pont of sale R. Scre MIRANE S.A.S. Cenon, France scre@labr.fr H. Ncolas LaBRI, Unversty of Bordeaux Talence, France ncolas@labr.fr Abstract Ths paper presents

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information