Identifying Citing Sentences in Research Papers Using Supervised Learning



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

What is Candidate Sampling

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Single and multiple stage classifiers implementing logistic discrimination

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Forecasting the Direction and Strength of Stock Market Movement

1 Example 1: Axis-aligned rectangles

An Interest-Oriented Network Evolution Mechanism for Online Communities

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

An Alternative Way to Measure Private Equity Performance

Gender Classification for Real-Time Audience Analysis System

Support Vector Machines

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Enterprise Master Patient Index

The Greedy Method. Introduction. 0/1 Knapsack Problem

Lecture 2: Single Layer Perceptrons Kevin Swingler

Calculation of Sampling Weights

Improved SVM in Cloud Computing Information Mining

Mining Multiple Large Data Sources

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Rank Based Clustering For Document Retrieval From Biomedical Databases

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Searching for Interacting Features for Spam Filtering

Project Networks With Mixed-Time Constraints

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

The OC Curve of Attribute Acceptance Plans

Title Language Model for Information Retrieval

How To Calculate The Accountng Perod Of Nequalty

Web Spam Detection Using Machine Learning in Specific Domain Features

Fast Fuzzy Clustering of Web Page Collections

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Statistical Methods to Develop Rating Models

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Predicting Software Development Project Outcomes *

Web Object Indexing Using Domain Knowledge *

Learning from Multiple Outlooks

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

Product Quality and Safety Incident Information Tracking Based on Web

BERNSTEIN POLYNOMIALS

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Dynamic Resource Allocation for MapReduce with Partitioning Skew

Performance Management and Evaluation Research to University Students

Performance Analysis and Coding Strategy of ECOC SVMs

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

Logistic Regression. Steve Kroon

Financial market forecasting using a two-step kernel learning method for the support vector regression

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

8 Algorithm for Binary Searching in Trees


APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

L10: Linear discriminants analysis

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

A PROBABILITY-MAPPING ALGORITHM FOR CALIBRATING THE POSTERIOR PROBABILITIES: A DIRECT MARKETING APPLICATION

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Realistic Image Synthesis

A heuristic task deployment approach for load balancing

Statistical algorithms in Review Manager 5

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

An Inductive Fuzzy Classification Approach applied to Individual Marketing

Semantic Link Analysis for Finding Answer Experts *

Active Learning for Interactive Visualization

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

DEFINING %COMPLETE IN MICROSOFT PROJECT

SVM Tutorial: Classification, Regression, and Ranking

Detecting Credit Card Fraud using Periodic Features

Efficient Project Portfolio as a tool for Enterprise Risk Management

How To Analyze News From A News Report

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

A machine vision approach for detecting and inspecting circular parts

IMPACT ANALYSIS OF A CELLULAR PHONE

Planning for Marketing Campaigns

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Human behaviour analysis and event recognition at a point of sale

Recurrence. 1 Definitions and main statements

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Support vector domain description

Transcription:

Identfyng Ctng Sentences n Research Papers Usng Supervsed Learnng Kazunar Sugyama Tarun Kumar Mn-Yen Kan Ramesh C. Trpath Department of Computer Scence Indan Insttute of Department of Computer Scence Indan Insttute of Natonal Unversty of Sngapore Informaton Technology Natonal Unversty of Sngapore Informaton Technology Sngapore Allahabad, Inda Sngapore Allahabad, Inda sugyama@comp.nus.edu.sg tar.ta@gmal.com kanmy@comp.nus.edu.sg rctrpath@ta.ac.n Abstract Researchers have largely focused on analyzng ctaton lnks from one scholarly work to another. Such ctng sentences are an mportant part of the narratve n a research artcle. If we can automatcally dentfy such sentences, we can devse an edtor that helps suggest when a partcular pece of text needs to be backed up wth a ctaton or not. In ths paper, we propose a method for dentfyng ctng sentences by constructng a classfer usng supervsed learnng. Our experments show that smple language features such as proper nouns and the labels of prevous and next sentences are effectve features to dentfyng ctng sentences. Keywords nformaton retreval; dgtal lbrary; dscourse processng; ctaton analyss I. INTRODUCTION When we wrte research papers or artcles, we often make references to prevous works n our own research feld. Ctatons serve varous purposes: as evdence for clams, as an acknowledgment of other s work, among other functons. We term sentences that contan such references as ctng sentences. These statements are usually followed by a ponter to the full reference located at the end of a paper n a Reference or Bblography secton. Our am n ths work s to dentfy whether a sentence n a paper needs a ctaton or not. For example, consder the followng two sentences: Sentence 1: We want to buld a system whch can help n fndng a person s job easly. Sentence 2: The HSQL project was led by the Swedsh State Bureau wth partcpants from Sweden, Denmark, Fnland, and Norway. In the above two statements, Sentence 2 needs a ctaton because t refers to the prevous work, namely, the HSQL project. On the other hand, most people would agree that Sentence 1 does not need a ctaton because there s no descrpton of prevous work. In ths paper, we propose a method 1 for dentfyng sentences that requre ctatons such as Sentence 2 above. Whle ctng sentences are often trvally marked wth a ctaton marker (e.g., [5] or (Brown, 1990) ), an mportant dstncton n our problem s that we consder 1 Ths work s supported by a Meda Development Authorty (MDA) grant Interactve Meda Search, R-252-000-325-279. detectng such a sentence when such markers are not present. We show that our approach, whch constructs a supervsed classfer from smple features, acheves a hgh level of accuracy. Wth respect to references and other works on the analyss of ctaton nformaton [1, 2], to the best of our knowledge, the work presented here s the frst work to dentfy ctng sentences usng natural language analyss. Such a module s useful n a smart authorng envronment, whch can help authors by suggestng whether statements made n the paper draft need ctatons or not. Such a system wll take n a research paper as nput and dentfy the statements that need ctaton. It can also be used at the tme of wrtng a paper by suggestng to authors whether current present statement needs a ctaton or not. Ths paper s organzed as follows: In Secton II, we revew related work on analyzng ctatons n research papers and brefly descrbe the two classfers we use n our experment. In Secton III, we descrbe our approach n dstngushng between sentence that requre ctatons and those that do not. In Secton IV, we present the expermental results for evaluatng our approach. Fnally, we conclude the paper wth a summary and drectons for future work n Secton V. II. RELATED WORK We frst revew related works on scholarly ctaton analyss, and then survey the fundamental background of two state-of-the-art classfers maxmum entropy and support vector machnes that we employ n later our experments. A. Ctaton Analyss of Research Papers Ctaton nformaton has been used for nformaton retreval snce the early stages of ths feld. As far as we know, there are two types of research n the feld of ctaton analyss of research papers, (1) ctaton count to evaluate the mpact of scentfc papers, and (2) ctaton context analyss. Ctaton count s wdely used n evaluatng the mportance of a paper because t s strongly correlated wth academc document mpact [3]. The Thomson Scentfc ISI Impact Factor (ISI IF) s the representatve approach usng ctaton count [4], whch factors ctaton count wth a movng wndow to calculate the mpact of certan publcaton venues. The advantages of ctaton count are () ts smplcty n computaton; and () ts proven track record n deployment n scentometrc applcatons. However, ctaton count has well-known lmtatons: Ctng papers wth hgh mpact and ones wth low mpact are treated equally n standard ctaton count.

In order to overcome ths problem, many works recently have employed the noton of PageRank [5] to better weght and control for the nfluence of papers of dfferng mpact [6, 7, 8, 9, 10]. Ctaton nformaton s statstcal n nature. Therefore, many researchers have focused on ths characterstc. For example, Kessler et al. [11] proposed to use the noton of bblographc couplng, where two documents are sad to be coupled f they share one or more references. Small [12] proposed a complementary method, termed coctaton analyss, where the smlarty between documents A and B s measured by the number of documents that cte A and B. In addton, researchers have also focused on the potental usefulness of the text assocated wth ctatons n specfc applcatons, such as text summarzaton [13, 14], thesaurus constructon [15], and nformaton retreval [16, 17]. B. Maxmum Entropy (ME) The framework of maxmum entropy [18] has already been wdely used for a varety of natural language tasks such as prepostonal phrase attachment [19], language modelng [20, 21], part-of-speech taggng [22] and text segmentaton [23]. Maxmum entropy has been shown to be an effectve and compettve algorthm n these domans. Statstcal modelng constructs a model that best accounts for some tranng data. Specfcally, for a gven emprcal probablty dstrbuton ~ p, a model p s bult to result n a dstrbuton as close to ~ p as possble. Gven a set of tranng data, there are numerous ways to choose a model p that accounts for the data. It can be shown that the probablty dstrbuton defned by Equaton (1) s the one that s closest to ~ p n the sense of Kullback-Lebler dvergence, when subjected to a set of feature constrants: k 1 P( y x) = exp (, ), (1) ( ) λ f x y Z x = 1 where p ( y x) denotes the condtonal probablty of predctng an class y on seeng the context x. f y) ( = 1,, k) are feature functons, λ ( = 1,, k) are the weghtng parameters for f y) ( = 1,, k). k s the number of features and Z(x) s a normalzaton factor to ensure that the p ( y x) scores sum to one and reflect true probabltes. Ths maxmum entropy model represents evdence wth bnary functons known as contextual predcates n the form: f ' y cp, y 1 ) = 0 If y=y and cp(x)=true otherwse where cp s the contextual predcate that maps a outcome y and context x par to {true, false}. The human expert can choose arbtrary feature functons n order to reflect the characterstcs of the problem doman as fathfully as possble. The ablty of freely ncorporatng problem-specfc knowledge n terms of feature functons gves ME models an advantage over other learnng paradgms, whch often suffer from strong feature ndependence assumptons (e.g., n the case of the naïve Bayes classfer). Once a set of features s chosen by the human expert, the correspondng maxmum entropy model can be constructed by addng features as constrants to the model and teratvely adjustng the weghts of these features automatcally to best reflect the tranng data. Formally, we requre that: E ~ p < f >= E p < f >, where E p < f >= ~ ~ p x y f x (, ) y) s the emprcal expectaton wth respect to the model dstrbuton p. Among all the models subjected to these constrants, a unque soluton exsts that preserves the uncertanty n the orgnal constrants and does not add any extra bas to the soluton ths s the maxmum entropy soluton obtaned by the tranng procedure. Gven an exponental model wth n features and a set of tranng data (emprcal dstrbuton), we need to fnd the assocated weght for each of the n features to maxmze the model s log-lkelhood: L ( p) = ~ p y)log p( y x). x, y It s mportant to select an optmal model subjected to gven constrants from the log-lnear famly. There are three well-known teratve scalng algorthms specally desgned to estmate parameters of ME models of the Equaton (1): Generalzed Iteratve Scalng [24] and Improved Iteratve Scalng [25], and Lmted Memory Varable Metrc [26]. C. Support Vector Machne (SVM) Support Vector Machne (SVM) [27] has many desrable qualtes that make t one of the most popular algorthms. It not only has a sold theoretcal foundaton, but also classfes more accurately than most other algorthms n many applcatons such as Web page classfcaton and bonformatcs tasks. Gven a tranng set of nstance label pars (x, y ) n (=1,, l) where x R s a tranng vector and y { 1, 1} l s ts class label, an SVM fnds a lnear separatng hyperplane wth the maxmal margn as a soluton to the followng optmzaton problem: 1 mn w w, b, ξ 2 T w + C ξ = 1 T subject to y ( w φ( x ) + b) 1 ξ, ξ 0. l As the orgnal problem may not be lnearly separable, x can be mapped nto a hgher dmensonal space by a functon φ. Then, SVM fnds a lnear separatng hyperplane wth the maxmal margn n ths hgher dmensonal space. C > 0 s the penalty parameter of the error term. Interestngly, T K( x, x j ) φ( x ) φ( x j ), the kernel functon, can be of dfferng forms lnear, polynomal, radal bass and, sgmod functons are often used. The SVM depends drectly on the kernel functon, and f a surrogate method that yelds the functon values can be gven, the explct Cartesan product need not be calculated, greatly savng computatonal complexty. (2)

Fgure 1. Overvew of our system. III. PROPOSED METHOD Fgure 1 llustrates our proposed system. Our proposed system conssts of the followng four parts. We detal each of these steps ndvdually. (1) Constructng proper tranng and test data sets, (2) Extractng the approprate features from the data, (3) Constructng the classfer, (4) Classfyng sentences as ctng or non-ctng. (1) Constructng proper tranng and test data sets For our experments, we utlze the Assocaton for Computatonal Lngustcs standard research artcle corpus, the ACL Anthology Reference Corpus (ACL ARC), dscussed n Secton IV. We frst remove Reference secton and stop words 2 from each paper. We defne a sentence that contans ctng nformaton as postve nstance, and a sentence that does not contan that as negatve nstance. Our dea s to use sentences that have exstng ctatons as tranng data, by removng the ctaton marker. If a ctaton marker s found va heurstc rules, we remove t. We then dvde our dataset nto ten equal szed parts, to be used as cross valdaton folds. As we perform 10-fold cross valdaton, we dvded the whole data set nto 90% of tranng data and 10% of test data and repeat the below expermental process 10 tmes to obtan our fnal evaluaton results. 2 Lst of 571 words obtaned from ftp://ftp.cs.cornell.edu/pub/smart/englsh.stop (2) Extractng the approprate features from the data In ths module, we extract features from each sentence n order to construct the classfer later. The features we extracted are as follows: a) Ungram - Ungram features nclude the words contaned n the sentences. After removng the stop words from the sentences, each word appearng n the sentence serves as a bnary feature, turned on for the ndvdual sentence. Ungrams are an mportant class of features as certan types of words tend to appear more frequently n ctng/non-ctng sentence. For example, the sentence, The classfcaton model s traned can be represented wth the ungrams classfcaton, model, traned ( the and s are stop words), among others. b) Bgram - By combnng two adjacent words n sentences, we create bgram features. Bgram features help n analyzng the effect of two ndependent words n combnaton. Ths combnaton helps sgnfcantly n classfcaton. Usng the same example as above, we extract the bgrams, classfcaton model, model traned (smlarly, the and s are stop words), among others. c) Proper Nouns - These are nouns that gve the names of people, locatons, systems and organzatons. Based on the presence of varous types of proper nouns, the respectve bnary features are set. Ths feature also

plays a sgnfcant role n detectng ctng sentences, as from experence, we know that such sentences often refer to partcular scholars, ther developed systems or nsttutons. d) Prevous and Next Sentence - We also can nclude nformaton about the classfcatons of neghborng sentences ther ctaton/non-ctaton status. For example, f the prevous sentence s a ctng sentence, the followng sentence may contnue to dscuss the same work and would be less lkely to contan an addtonal ctaton. e) Poston - The postonal feature gves nformaton about the part of document n whch the sentence appears. To mplement ths feature, we dvde the document nto sx equal parts: one part for the frst 1/6th of the document, one part for the second 1/6th, untl the fnal sxth 1/6th. We turn on one of the sx bnary features based on whch part of document the sentence appears n. These features are mportant as statements appearng n certan sectons exhbt markedly dfferent probabltes for ctaton. For example, sentences n the mddle or end of a research paper are lkely to dscuss the authors own work or the evaluaton and are less probable ctaton areas, as compaerd to the begnnng of the artcle, where authors often dscuss and credt pror work. f) Orthographc - Ths set of features check for mscellaneous formattng characterstcs, ncludng specfc orthographes used n the sentence. For example, a sentence contanng numbers or sngle captal letters may be more ndcatve of ctng sentences, as they may present comparatve results or author ntals from cted works. (3) Constructng the classfer Usng the tranng data set descrbed n (1), we construct two supervsed classfers based on two publcly avalable mplementatons of supervsed learnng frameworks: maxmum entropy (ME) 3 [18] and support vector machne (SVM) 4 [27] as descrbed n Secton II. Both methodologes were chosen as they effcently handle a large number of non-ndependent features, common to many natural language tasks. In addton, our task s a bnary classfcaton problem whether a sentence s ctng sentence or not. SVM often brngs superor results n bnary classfcaton tasks. (4) Classfyng sentences as ctng or non-ctng Gven traned classfers n (3), we can then apply the traned models to new, unseen sentences. We employ these traned models and assess the performance of the models usng accuracy as our evaluaton measure. IV. EXPERIMENTS A. Expermental Data We used the ACL Anthology Reference Corpus (ACL ARC) 5 [28]. The ACL ARC s constructed from a sgnfcant subset of the ACL Anthology 6, whch s a dgtal archve of conference and journal papers n natural language processng and computatonal lngustcs. The ACL ARC conssts of 10,921 artcles from the February 2007 snapshot of the ACL Anthology. Usng the ACL ARC, we extracted features from each of the 955,755 sentences ncluded n the 10,921 artcles. Usng regular expresson pattern matchng to fnd ctaton markers, we dentfed 112,533 sentences as ctng sentences (postve nstances), and post-processed them to remove the ctaton marker. We deemed the remanng 843,242 sentences as non-ctng sentences (negatve nstances). In our experments, the am s to dentfy whether each sentence s a ctng sentence or not. B. Expermental Results Usng each of the ndvdual feature classes from a) to f) descrbed n Secton III, we constructed both SVM and ME classfers. We evaluated our models usng smple accuracy defned as follows: (Number of correct classfcatons) Accuracy =, (Total number of test cases) where correct classfcatons means that the learned model predcts the same class as the orgnal class of the test case. 1) Classfcaton Accuracy by Cross Valdaton We frst conducted experments usng 10-fold cross valdaton for both ME and SVM. For the SVM experments, we used the default settngs of LIBSVM package, settng the kernel as the radal bass functon, and the value of C n Equaton (2) as 1.0. Table 1 shows expermental results obtaned usng classfers constructed both ME and SVM. For comparson, we also constructed an ntegrated classfer whch uses all of sx features, labeled as All n Table 1. Table 1. Accuracy obtaned by ME and SVM. Feature Accuracy (ME) Accuracy (SVM) (1) Ungram 0.876 0.879 (2) Bgram 0.827 0.851 (3) Proper Noun 0.882 0.882 (4) Prevous and 0.882 0.882 Next Sentence (5) Poston 0.875 0.877 (6) Orthographc 0.878 0.880 (7) All [(1) - (6)] 0.876 0.878 3 Maxmum Entropy Modelng Toolkt (Verson 20041229), http://homepages.nf.ed.ac.uk/lzhang10/maxent_toolkt.html 4 LIBSVM (Verson 2.89), http://www.cse.ntu.edu.tw/~cjln/lbsvm/ 5 Verson 20080325, http://acl-arc.comp.nus.edu.sg/ 6 http://www.aclweb.org/anthology/

Fgure 2. Classfcaton accuracy obtaned by ME. Fgure 4. Classfcaton accuracy obtaned by SVM wth dfferent value of C. Table 2. Optmal values of C and ther accuracy. Feature Optmal Value Accuracy of C (1) Ungram 0.8 0.879 (2) Bgram 0.8 0.851 (3) Proper Noun 0.9 0.882 (4) Prevous and 0.9 0.882 Next Sentence (5) Poston 0.9 0.877 (6) Orthographc 0.9 0.880 (7) All [(1) - (6)] 0.9 0.878 Fgure 3. Classfcaton accuracy obtaned by SVM. 2) Classfcaton Accuracy on Dfferent Sze of Tranng Data For most of learnng algorthms, the sze of the tranng data affects the classfcaton accuracy. Therefore, we conducted experments to emprcally assess how classfcaton performance changes when the sze of tranng data s a subset of the full tranng data from 10% to 90%. Fgures 2 and 3 (both shown at the same scale) gve the expermental results obtaned by ME and SVM, respectvely. 3) Classfcaton Accuracy n Dfferent Value of C n SVM Fnally, we conducted a seres of experments n tunng the SVM performance. In the SVM framework, the value of C n Equaton (2) the error term denotes the tolerance of the SVM to accept msclassfcatons n the separatng hyperplane, also affects classfcaton accuracy. Thus, we conducted experments to fnd the classfcaton accuracy obtaned by dfferent values of C. Fgure 4 shows the accuracy obtaned by usng SVM wth dfferent values of C. C. Dscusson Accordng to Table 1, n the frst set of experments on cross valdaton, both ME and SVM acheved an accuracy greater than 0.82. We observed a small dfference of accuracy (0.002 to 0.024) between these classfers. Therefore, we can fnd that the accuracy of ths knd of task does not depend on classfers. Especally, smple features such as Proper Noun and the context of Prevous and Next Sentence brng better results (0.882) among them. Interestngly, the Bgram feature s not so effectve among the features we used. As bgram features are often very sparse, t may be dffcult to construct an accurate classfer wth such few overlappng features. By varyng the sze of the tranng data, we obtan slght varatons n performance n both the SVM and ME frameworks. Accordng to Fgure 2, ME exhbts more performance varaton, where the accuracy of features such as Ungram, Bgram, and All s nfluenced by the sze of tranng data. In other words, the larger the sze of the tranng data, the more accurate the classfcaton results are. Accordng to Fgure 3, the SVM framework shows less varaton. When the data sze was slghtly reduced (80-90% of the orgnal), a slght mprovement n accuracy s

observed. For example, n Prevous and Next Sentence, whle the accuracy at 10% of tranng data s 0.872, the accuracy at 90% of tranng data s 0.882. Moreover, n All, whle the accuracy at 10% of tranng data s 0.875, the accuracy at 90% of tranng data s 0.880. In future work, we may nvestgate ths pecularty n more detal. Fnally, accordng to Fgure 4, when dfferent error term values of C n the SVM framework were used, we observed that the best accuracy s obtaned when the value of C s set to slghtly lower than 1.0 n each feature. The optmal values of C and ther accuracy obtaned by each feature are shown n Table 2. Together wth the frst expermental results, the results show that Proper Noun and the context of Prevous and Next Sentence features brng the best accuracy n both ME and SVM (0.882 wth C=0.9). Surprsngly, the composte classfers that use all feature classes underperform classfers that are traned only on these two sources of data. V. CONCLUSION We have descrbed a method for dentfyng ctaton sentences by constructng classfer usng supervsed learnng approaches wth smple features extracted from research papers. Expermental results showed that both proper nouns and contextual classfcaton of the prevous and next sentence are effectve features for tranng accurate models n both SVM and ME frameworks. In future work, we plan to buld an edtor that wll help authors wrte a research paper by advsng them when statements n ther draft need a ctaton or not. REFERENCES [1] S. Lawrence, C. L. Gles, and K. Bollacker: Dgtal Lbrares and Autonomous Ctaton Indexng, IEEE Computer, 32(6): 67-71, 1999. [2] I. G. Councll, C. L. Gles, and M.-Y. Kan: ParsCt: An Open- Source CRF Reference Strng Parsng Package, In Proc. of the 6th Internatonal Conference on Language Resources and Evaluaton Conference (LREC08), pages 661-667, 2008. [3] F. Narn: Evaluatve Bblometrcs: The Use of Publcaton and Ctaton Analyss n the Evaluaton of Scentfc Actvty, Computer Horzons, Cherry Hll, NJ, 1976. [4] E. Garfeld: Ctaton Indexng: Its Theory and Applcaton n Scence, Technology, and Humantes, John Wley and Sons, NY, 1979. [5] L. Page, S. Brn, R. Motwan, and T. Wnograd: The PageRank Ctaton Rankng: Brngng Order to the Web, Stanford Dgtal Lbrary Technologes Project, SIDL-WP-1999-0120, 1998. [6] J. Bollen, M. A. Rodrguez and H. Van De Sompel: Journal Status, Scentometrcs, 69(3): 669-687, 2006. [7] Y. Sun and C.L. Gles: Popularty Weghted Rankng for Academc Dgtal Lbrares, In Proc. of the 29th European Conference on Informaton Retreval (ECIR 2007), pages 605-612, 2007. [8] M. Krapvn and M. Marchese: Focused PageRank n Scentfc Papers Rankng, In Proc. of the 11th Internatonal Conference on Asan Dgtal Lbrares (ICADL 2008), Lecture Notes n Computer Scence (LNCS), Vol. 5362, pages 144-153, 2008. [9] N. Ma, J. Guan, and Y. Zhao: Brngng PageRank to the Ctaton Analyss, Informaton Processng and Management, 44(2), pages 800-810, 2008. [10] H. Sayyad and L. Getoor: FutureRank: Rankng Scentfc Artcles by Predctng ther Future PageRank, In Proc. of the 9th SIAM Internatonal Conference on Data Mnng, pages 533-544, 2009. [11] M. M. Kessler: Bblographc Couplng Between Scentfc Papers, Amercan Documentaton, 14(1): 10-25, 1963. [12] H. Small: Co-Ctaton n the Scentfc Lterature: A New Measure of the Relatonshp Between Two Documents, Journal of the Amercan Socety of Informaton Scence, 24(4):265-269, 1973. [13] S. Teufel, A. Sddharthan, and D. Tdhar: Automatc Classfcaton of Ctaton Functon, In Proc. of the 2006 Conference on Emprcal Methods n Natural Language Processng (EMNLP 2006), pages 103-110, 2006. [14] V. Qazvnan and D.R. Radev: Scentfc Paper Summarzaton Usng Ctaton Summary Networks, In Proc. of the 22nd Internatonal Conference on Computatonal Lngustcs (Colng2008), pages 689-696, 2008. [15] J. Schneder: Verfcaton of Bblometrc Methods Applcablty for Thesaurus Constructon, PhD thess, Royal School of Lbrary and Informaton Studes, 1994. [16] A. Rtche, S. Teufel, and S. Robertson: Usng Terms from Ctatons for IR: Some Frst Results, In Proc. of the 29th European Conference on Informaton Retreval (ECIR 2007), pages 211-221, 2007. [17] A. Rtche, S. Robertson and S. Teufel: Comparng Ctaton Contexts for Informaton Retreval, In Proc. of the 17th Internatonal Conference on Informaton and Knowledge Management (CIKM'08), pages 213-222, 2008. [18] A. L. Berger, S. A. Della Petra, and V. J. Della Petra: A Maxmum Entropy Approach to Natural Language Processng, Computatonal Lngustcs, 22(1):39-71, 1996. [19] A. Ratnaparkh, J. Reynar, and S. Roukos: A Maxmum Entropy Model for Prepostonal Phrase Attachment, In Proc. of the ARPA Human Language Technology Workshop, pages 250-255, 1994. [20] R. Rosenfeld: Adaptve Statstcal Language Modelng: A Maxmum Entropy Apporach, PhD thess, Carnege Mellon Unversty, 1994. [21] S. F. Chen and R. Rosenfeld: A Gaussan Pror for Smoothng Maxmum Entropy Models, Techncal Report CMU-CS-99-108, Carnege Mellon Unversty, 1999. [22] A. Ratnaparkh: A Maxmum Entropy Model for Part-Of- Speech Taggng, In Proc. of the Conference on Emprcal Methods n Natural Language Processng, pages 133-142, 1996. [23] D. Beeferman, A. Berger, and J. Lafferty: Statstcal Models For Text Segmentaton, Machne Learnng, 34(1-3):177-210, 1999. [24] J. N. Darroch and D. Ratclff: Generalzed Iteratve Scalng for Log-Lnear Models, The Annals of Mathematcal Statstcs, 43(5):1470-1480, 1972. [25] S. D. Petra, V. J. Della Petra, and J. D. Lafferty: Inducng Features of Random Felds. IEEE Transactons on Pattern Analyss and Machne Intellgence, 19(4): 380-393, 1997. [26] R. Malouf: A Comparson of Algorthms for Maxmum Entropy Parameter Estmaton, In Proc. of the 6th Conference on Natural Language Learnng (CoNLL-2002). pages 49 55, 2002. [27] V. Vapnk: The Nature of Statstcal Learnng Theory, Sprnger, NY, 1995. [28] S. Brd, R. Dale, B. J. Dorr, B. Gbson, M. T. Joseph, M.-Y. Kan, D. Lee, B. Powley, D. R. Radev, Y. F. Tan: The ACL Anthology Reference Corpus: A Reference Dataset for Bblographc Research n Computatonal Lngustcs, In Proc. of the 6th Internatonal Conference on Language Resources and Evaluaton Conference (LREC08), pages 1755-1759, 2008.