Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction
|
|
|
- Lilian Osborne
- 10 years ago
- Views:
Transcription
1 Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction Minjun Park Dept. of Chinese Language and Literature Peking University Beijing, , China Yulin Yuan Dept. of Chinese Language and Literature Peking University Beijing, , China Abstract The BI ( 比 )-structure, which highlights a contrasting characteristic between two items, is the key comparative sentence structure in Chinese. In this paper, we explore the methods of extracting the 6 constituents of the BI-structure. Previous studies are often restricted to probabilistic classification methods, where the feature used hardly embodies linguistic knowledge, therefore unintuitive. As an alternative, we propose the use of two linguistic knowledge-driven approaches, namely the POS chunking-based and TBL-based methods. The first model effectively captures grammatical restrictions over POS sequential patterns. The second model set up on new and lesser templates performs better than Brill s (1995). Experimental results show that the proposed models are simple and effective methods for Chinese comparative element extraction task. 1 Introduction Comparison is the most representative figure of evaluation. Much of evaluative information is now available in the web, and comparative sentences prevail in Chinese web texts in increasing numbers. A significant amount of research has been conducted on automatic identification of Chinese comparative sentence and its semantic elements. However, the techniques proposed in earlier works are mostly based on statistical classification method. Due to the opaque nature of stochastic features, it is often difficult to comprehend what linguistic aspects are applied to the model. In this paper, we present a detailed analysis on linguistic behavior of the BI ( 比 )-structure, which is the key comparative structure in Chinese. The application of the two rule-based approaches suggested in this paper are different from previous models in that they fully use syntactic and lexical features which are intrinsic to the structure. This paper first presents a brief literature review on the subject. The target of extraction task, i.e. Comparative Elements (CE) is then defined before demonstration of the two proposed approaches, i.e. POS chunking-based and TBLbased extraction models. Finally, we discuss the experiment s results and present our conclusion. 2 Related Work The research on comparative sentence has been a main concern from the beginning of modern Chinese linguistics research. The different types of Chinese comparative sentences were first mentioned in Mashi Wentong (1898) and their classification was elaborated later by Chinese grammarians such as Lü (1942), Ding (1961) and Liu (1983). Following by their preliminary work, a series of research focused on defining syntactic and semantic structure of the Chinese comparative sentence was conducted. Li (1986) demonstrates the Chinese BI-structure simplifying rules. Shao (1990) investigates the rule of replacing and omitting elements in Chinese comparative structure. On the other hand, Studies in Natural Language Processing mainly dealt with the identification of comparative sentence and its elements. Based on Jindal and Liu s research (2006) on comparative sentences in English, Huang(2008) and Song(2009) made a stochastic classifier based on SVMs and CRFs to tackle the Chinese comparative sentence identification and element extraction task. Besides, many models were also suggested in the fifth Chinese Opinion Analysis Evaluation (COAE2013) track. Zhou(2014) and Li(2013) made use of pattern matching technique, and Wei(2013) proposed a rule-based decision making approach based on CRF sequential tag- 79 Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing (SIGHAN-8), pages 79 85, Beijing, China, July 30-31, c 2015 Association for Computational Linguistics and Asian Federation of Natural Language Processing
2 ging. Despite all these models, their performance has not shown satisfying results 1. The identification of Chinese comparative elements especially still remains as a big challenge. A TBL-based approach, which showed good performance in Korean (Yang and Ko, 2011), would be an alternative to usual methods. 3 Task Description 3.1 Comparative Elements (CE) We refer to Comparative Elements (CE) as entities and attributes which directly occur within comparative sentences. We defined 6 CEs as below. e.g. 新 飞 度 车 身 结 构 的 刚 度 比 前 代 提 高 了 164% The solidity of XinFeiDu s body structure has increased 164% than the previous design. 新 飞 度 刚 度 比 前 代 提 高 164% XinFeiDu solidity BI previous design increased 164% SUB DIM BI OBJ RES EXT CE (label) Subject Entity (SUB) Comparative Marker (BI) Object Entity(OBJ) Dimension (DIM) Comparative Result (RES) Comparative Extent (EXT) Definition An element of comparison, i.e. topic of the sentence. Comparative sentence marker, which is BI( 比 ) in Chinese Bi-structure 2. An entity that is being compared to. It is often the complement of Biprepositional phrase. Shared property of entities being compared. The relation between entities being compared. It is often the syntactic head of comparative predicate. Relative difference in degree or quantity between entities in terms of DIM. Table 1: Comparative Elements (CE) in BI-structure Our task is to automatically extract these 6 CEs from the sentences. Note that these elements cannot simply be determined by syntactic criteria. They are involved with semantic category to some extent, but we do not use additional semantic features such as semantic role labels or lexical taxonomies in this paper. 1 In COAE 2013 Task 2 (Chinese Comparative Element Identification), the best performance F1-score was 0.35 (Tan et al. 2013:25). 2 There are other comparative markers such as 比 不 过, 不 如, 优 于 which are sometimes combined with RES morphologically. However, BI( 比 ) is the only comparative marker in the scope of this paper Corpus The corpus used in this experiment consists of 1,036 Chinese BI-structure sentences, coming from the open dataset of COAE 2013 Task 2 (Tan et al. 2013). The sentences are a collection of customer reviews and opinions from different Chinese websites pertaining to cars and electronics. 1) Preprocessing: We first conduct word segmentation and POS tagging by using ICTCLAS 3. Second, we had to manually revise to avoid any errors because of the informal language used on the web. Three annotators were appointed to revise typos. In addition, 3,000 word-size domain-specific lexicons 4 are also utilized to guarantee the quality of word segmentation and POS tagging. 2) CE labeling: The 6 types of Comparative Elements (CE) in the 1,036 sentences were manually annotated with the corresponding CE labels of Table 1. This task was done by three trained annotators of Chinese linguistics major. Their work was double-checked by one another, and any inconsistencies between annotators were discussed before reaching an agreement. The annotated corpus was then transformed to IOB format. 4 Two methods of Comparative Elements Extraction We now present two different proposed techniques. Model 1 uses basic part-of-speech chunking-based method and Model 2 employs Transformation-Based Error-Driven Learning (TBL) (Brill, 1995) for identifying CEs. 4.1 POS chunking-based CE extraction 4 elements of CEs, i.e. BI, OBJ, RES and EXT, form a regular sequential pattern across the sentences. First, OBJ generally occurs as complements of BI-prepositional phrase, which is mostly a noun phrase. Second, RES and EXT usually form predicates, modified by the BIprepositional phrase, i.e. [ [ 比 OBJ] prep [RES EXT] pred ]. Noticing this pattern, we can define chunk patterns with regular expressions as below. Punctuation as delimiter, the sentence is divided into small clauses If the clause contains 比 /p, the following chunk rules are applied to create chunks Acquired from Sougou s open database.
3 Rule1. BI:{<P><.*>*} Rule2. RES:<.*>}{<V> <A[DN]*> <D> Rule3. RES:<D> <V> <VSHI> <VYOU> <A[DN]*>{}<V> <A[DN]*> <D> Rule4. RES:<.*>{}<.*>*?<UDE1>[^<W J>][^$] Label the items in the first chunk as BI and OBJ, and label the second chunk as RES. For RES chunks, Rule 5 is applied to chunk EXT. Rule 5. EXT:<A> <V[N]*> <Y>{(<MQ> < M> <Q> <RY> <X> <D>)<.*>*} Table 2: chunking-based CE extraction process 5 We now give a step-by-step illustration of the actual extraction process of Table 2. Step 1 Detecting BI ( 比 )-Chunk For clauses containing the comparative marker BI ( 比 ), we create a chunk that begins with it (Rule 1). We call it BI ( 比 )-Chunk. Step 2 Splitting into Two Phrases BI ( 比 )-Chunk can be divided into two chunks, i.e. BI-prepositional phrase and predicate because they belong to very distinctive syntactic categories. The former cannot appear independently, usually taking a noun as its object. The latter functions as the predicate, and is mostly an adjective or a verb 6. Therefore, we designed Rule 2 to split them. Step 3 Merging Incorrectly Separated Predicates In most cases, however, the predicate is a complex phrasal structure. Therefore, Rule 2 incorrectly splits chunk that should have not been sep- 5 For specifying chunk rules intuitively, we directly quote NLTK s description of chunking operator (Bird et al. 2009). (1) <T> represents for any token tagged with T. (2) {<pattern >} represents for Chunk Rule, which means creating a chunk with the given regex pattern within curly braces. (3) <pattern1>}{<pattern2> represents for Split Rule, which means splitting a chunk into two chunks based on the specified pattern. (4) <pattern1>{}<pattern2> represents for Merge Rule, which means merging two chunks together based on the specified pattern. 6 Strictly speaking, verb and adjective are also able to occur in BI- prepositional phrase. Such a case will be handled in Step 4. arated. To solve this, we employ Rule 3 to merge incorrectly divided elements of the predicate group. Step 4 Dealing with DE ( 的 )-Structure In Chinese, DE ( 的 ) is often used to mark modification 7. It can be attached to various types of syntactic categories and modify the following word. DE ( 的 )-structure can be simplified as [ [ XP 的 ] NP ]. When a verb or adjective phrase takes the position of XP, the same error as in Step 3 occurs. To tackle this problem, we use Rule 4 that enunciates the unity of modifying elements occurring at the position of XP. Note that DE ( 的 ) is not necessarily restricted to modification marker. When occurring at the end of the sentence, it simply marks a subjective tone. Rule 4 makes use of punctuation tag (WJ) to discern this modal particle of DE ( 的 ) from modification marker. Step 5 CE Labeling After successfully extracting the two chunks (BIprepositional phrase and predicate) following the above mentioned 4 steps, we label each item in these chunks with BI, OBJ and RES tags. Step 6 (optional) EXT Identification Comparative Extent (EXT) usually begins with numerals, following the head of the predicate. Rule 5 detects possible EXTs in RES chunk. 4.2 TBL-based CE extraction The advantage of using the POS chunking-based method is that it allows direct capture of linguistic information. However, (a) it requires painstaking process of manual rule construction; (b) 7 It may be an inadequate way of defining DE ( 的 ) because of its flexible and diverse nature. Exceptional cases will be discussed in 5.1. See Zhu(1961) for further details. 81
4 and an error in any step could damage the performance of the whole CE extraction process Transformation-Based Learning We tested an automated learning method, known as Transformation-Based Error-Driven Learning (TBL) (Brill, 1995). The basic idea of TBL is learning from mistakes. First, the researcher may apply an initial-state annotator to the training corpus. Second, the set of user-defined templates are then used to form candidate rules. Third, each of the rules is in turn applied to the training corpus. At the same time, the net improvement of the rule is calculated and recorded for evaluation of candidate rules. Throughout the training, the process of deriving rules, scoring and selecting rules and applying them is iterated, creating an ordered sequence of transformation rules. Figure 1: Learning Procedure of TBL Modified from Brill(1995), Ramshaw and Marcus(1999) TBL has a well-known advantage, i.e. perspicuity of linguistically meaningful rules. Different from the POS chunking-based model, the TBLbased model is more robust and possibly captures useful information that may not be noticed by the human engineer (Brill, 1995:552). Therefore, the use of TBL allows us to capture some otherwise ignored CE patterns Preliminary TBL-based CE Tagger We treated CE extraction task as a tagging problem. Each token in training data is given an initial tag by ICTCLAS tagger. The TBL learner with user-defined templates is then trained on the training data. Consequently, we obtain the TBLbased CE tagger as a CE extraction model. The templates play a key role in this model because the TBL-learner uses these templates to generate possible rules, which directly affect the overall performance. In order to see which type of feature contributes more to the TBL-based model accuracy rate, we divide the original template proposed in Brill (1995:553, 556) into three template subsets: (a) tag features template; (b) lexical features template; (c) both features template. Type of features # of templates # of candidate rules Accuracy (%) (a) Tag 11 21, (b) Lexicon 15 59, (c) Tag & Lexicon 26 81, Table 3: CE Extraction Accuracy based on different feature templates Since the result of (c) is the best, we take it as a standard template. Table 4 shows evaluation results using TBL-based model with 26 templates (c). We regard this score as our baseline performance. SUB DIM BI OBJ RES EXT Pre Rec F Table 4: The result of baseline system (%) Search for Optimal Template We now present how we obtained our proposed TBL-based CE extraction model. The baseline system is based on a relatively large amount of rules and templates. Therefore, the reduction of rules is preferable for efficient application of TBL. According to Brill (1995: 560), although the accuracy of TBL-based tagger increases with the number of transformation rules, its marginal effect dramatically decreases, and leads to computational cost. We found that 200~300 rules are desirable for our CE detecting task. As for templates, we achieved the best performance when using tag and lexicon sequences within a radius of 3 tokens as features. For every token in BI-structure, change tag a to tag b when: 1. The current word is w. W 0 2. One of the three preceding words T -3,-2,-1 is tagged z. 3. One of the three following words T 1, 2, 3 is tagged z. 4. The word two after is tagged z. T 2 5. The word two before is tagged z. T The following word is tagged z. T 1 7. The preceding word is tagged z. T The preceding word is w. W The current word is w, and the W 0 & T -1 preceding tag is t. 10. The preceding tag is t, the current T -1 &T 0 &W 1 tag is t 2 and the following word is w. Table 5: 10 proposed templates 82
5 Based on the above-mentioned templates, the TBL-based model generates a set of possible candidate rules. Table 6 lists 10 transformation rules of highest score. Pass Old tag Context New tag 1 P W 0 = 比 BI 2 A T -3,-2,-1 = BI RES 3 NZ T -3,-2,-1 = BI OBJ 4 N T -3,-2,-1 = BI OBJ 5 M T -3,-2,-1 = RES EXT 6 A T -3,-2,-1 = OBJ RES 7 N T 1, 2, 3 = BI DIM 8 NZ T 1, 2, 3 = BI SUB 9 UDE1 T -3,-2,-1 = BI OBJ 10 X T -3,-2,-1 = BI OBJ Table 6: 10 Rules of highest score With the rules given above, the model takes example (1a) as an input, and applies the rules in order of , producing the CE-tagged result of example (1b). (1a) E1-471/nz,/wd 声 音 /n 比 /p AS4752/nz 的 /ude1 好 /a 多 /m 了 /y 哦 /e (1b) E1-471/sub,/wd 声 音 /dim 比 /bi AS4752/obj 的 /obj 好 /res 多 /ext 了 /y 哦 /e In addition, Examples (2-5) below illustrate some of the linguistically meaningful transformation rules that the TBL model based on 10 templates (Table 5) has captured. (2) VYOU->RES if Word: 比 /bi 同 价 位 /obj 机 型 /obj 更 /d 有 /res 分 量 /res /wj BI same-price model more have amount (This product) has more amounts for the same price. (3) EXT->RES if Word: 比 /bi 捷 达 /obj 的 /obj 要 /v 多 /res 得 /ude3 多 /ext /wj BI Jetta DE should more DE much (A car model s something) should be much more than a Jetta s. In (2-3), 更 is equivalent to more in English, and 要 conveys subjective meaning of difference in degree. The proposed model makes use of them as an RES marker because they frequently occur before RES. (4) RZ->OBJ if Word: 诺 基 亚 /sub 5230/sub 同 等 /b 价 位 /n 下 /f 比 /bi 其 它 /obj 手 机 /obj 都 /d 好 /res Nokia 5230 is even better than the equivalent class of other cellphones. (5) RES->EXT if Word: Pos:RES@[-1] 比 /bi 老 /obj 天 籁 /obj 的 /ude1 油 漆 /dim 硬 /res 好 /ext 多 /ext /wj (A car model s coating) is much stronger than the coatings of Teana. In (4), the pronoun following BI 其 它 is likely to be a constituent of OBJ. 好 is very likely to be a degree complement, i.e. EXT, if RES precedes it. Instead of functioning as RES, it stresses the degree of RES as shown in example (5). 5 Results 5.1 Result of POS Chunking-based Model The overall performance of the chunking-based CE extraction model (Section 4.1) is as follow. BI OBJ RES EXT Precision Recall F-score Table 7: The results of chunking-based CE extraction (%) The CE mining process of chunking-based model is based on simplistic grammatical assumptions: (a) Only nominal elements serve as the complement of BI-prepositional phrase; (b) Predicates can be a word or a group of words (phrase) that are adjectives or verbs; (c) Within the BI ( 比 )- Chunk, the elements occurring before the modifier marker DE ( 的 ) are all regarded as modifier. These assumptions, of course, are somewhat over-generalized, and do not fit in many real cases 8. However, the 5 Rules applied based on these assumptions show a fair performance in Table 7 when applied to a limited scope of BI-Chunk. 5.2 Result of TBL-based Model All evaluations of TBL-based model in this paper are based on a 5-fold cross validation. The proposed TBL-based model with 10 templates shows the results below. SUB DIM BI OBJ RES EXT Pre Rec F Table 8: The results of TBL-based CE extraction (%) Guided by our new templates (Table 5), the model first locates comparative marker BI ( 比 ), then searches the surroundings for the tag/lexical features while gradually narrowing its scope. As a result, the 10 templates enable an effective detection of elusive CE instances such as those in example (2-7). 8 Under many circumstances in Chinese, a noun (or noun phrase) can also serve as predicate; Transferred-designation( 转 指 ) XP 的 construction can also act as subject other than as a modifier. 83
6 Model SUB DIM BI OBJ RES EXT POS chunking-based TBL-based (baseline, Templates) TBL-based (proposed, 10 Templates) Table 9: Comparison between models (f-score, %) Table 9 compares the scores of two CE mining methods, the POS chunking-based and the TBLbased approach. Compared to the TBL-based model, the POS chunking-based model is unable to extract SUB and DIM. Because these two elements frequently occur outside a BIprepositional phrase, it is hard to capture their irregular occurrence positions in the sentence. In contrast, the TBL-based model is able to detect SUB and DIM. However, their identification rate is relatively low. Nevertheless, our proposed TBL-based model outperforms the baseline system by using much smaller templates. It shows we found a simple and more expressive set of rule templates. Moreover, the proposed TBL-based model achieved an increase of 21% for RES and 14% for EXT f-score in comparison with the POS chunking-based model. This improvement mainly benefits from the proper use of both tag and lexical information. (6) 市 场 /nz 中 /f 比 /p 它 /rr 靓 /a 的 /ude1 产 品 /n 很 /d 少 /a /wj There are very few products prettier than that one in the market. (a) 市 场 /n 中 /f 比 /bi 它 /obj 靓 /obj 的 /ude1 产 品 /obj 很 /res 少 /res /WJ (b) RES->A if Word: 市 场 /n 中 /f 比 /bi 它 /obj 靓 /res 的 /ude1 产 品 /n 很 /d 少 /a /WJ (7) 花 冠 /nz 比 /p 伊 兰 特 /nz 贵 /a 近 /a 3 万 /m Corollas are more expensive than Elantras by nearly 30 thousand RMB. (a) 花 冠 /nz 比 /bi 伊 兰 特 /obj 贵 /res 近 /res 3 万 /ext (b) RES->EXT if Word: 花 冠 /nz 比 /bi 伊 兰 特 /obj 贵 /res 近 /ext 3 万 /ext As for examples (6-7), the POS chunking-based model (Section 4.1) incorrectly identifies 少, 近 as RES. As we can see in example (6a), the POS chunking-based model wrongly identifies 很 少 as RES because the BI-chunk 比 它 靓 occurs in front of the modification marker 的. In (7a), the model mistook 近 for RES because it cannot discern 近 from 贵 only with the tag information. In contrast, TBL-based model makes a correct decision of (6b) and (7b) based on lexical information. 6 Conclusion In order to make the best use of meaningful features in linguistic context, we have proposed the use of two rule-based methods for Chinese comparative element (CE) extraction. The POS chunking-based model performs well with basic Chinese grammatical rules. We then use the TBL-based method to extract other linguistic patterns that the first model can hardly detect. Results showed that our TBL-based model achieved higher score than Brill s (1995), demonstrating that our new 10 templates can effectively extract the distinct features of Chinese BI-structure as shown in examples (1-7). Chinese comparative element mining involves techniques of various domains including coreference resolution, named entity recognition and parsing. However, the linguistic features used in this paper are limited to instances of regular (type-3) grammars. In our future work, we plan to investigate some feasible Chinese linguistic features on the level of context-free grammars. References Bird, Steven, Edward Loper and Ewan Klein Natural Language Processing with Python. O Reilly Media Inc. Brill, Eric A Simple Rule-Based Part of Speech Tagger. In Proceedings of the workshop on Speech and Natural Language, pp Brill, Eric Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics, 21(4): Ding, Shengshu Xiandai Hanyu Yufa Jianghua. Beijing: Commercial Press. Huang, Xiaojiang et al Learning to Identify Chinese Comparative Sentences. Journal of Chinese Information Processing, 22(5): Jindal, Nitin and Bing Liu Identifying Comparative Sentences in Text Documents. In Proceedings of SIGIR 06, pp Jindal, Nitin and Bing Liu Mining Comparative Sentences and Relations. In AAAI (22):
7 Li, Linding Hanyu Jufa Juxing. Beijing: Commercial Press, pp Li, Yan et al PRIS_COAE at COAE 2013 Track. In Proceedings of the fifth Chinese Opinion Analysis Evaluation, pp Liu, Yuehua Shiyong Xiandai Hanyu Yufa. Beijing: Foreign Language Teaching and Research Press, pp Lü, Shuxiang Zhongguo Wenfa Yaolüe. Beijing: Commercial Press, pp Ma, JianZhong Mashi Wentong. Beijing: Commercial Press, pp Ramshaw, Lance. A. and Mitchell P. Marcus Text Chunking Using Transformation-based Learning. In Natural language processing using very large corpora, pp Springer Netherlands. Shao, Jingmin Biziju Tihuan Guilü Chuyi. Zhongguo yuwen, vol.6. Song, Rui et al Chinese Comparative Sentences Identification and Comparative Relations Extraction. Journal of Chinese Information Processing, 23(2): Tan, Songbo et al Overview of Chinese Opinion Analysis Evaluation In Proceedings of the fifth Chinese Opinion Analysis Evaluation, pp Wei, Xianhui et al DUTIR: Method Research of Sentiment Analysis and Elements Extraction of Chinese Short Text. In Proceedings of the fifth Chinese Opinion Analysis Evaluation, pp Yang, Seon and Youngjoong Ko Extracting Comparative Entities and Predicates from Text Using Comparative Type Classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp Zhou, Hongzhao et al Chinese Comparative Sentences Identification and Comparative Elements Extraction Based on Semantic Classification. Journal of Chinese Information Processing, 28(3): Zhu, Dexi Shuo De. Zhongguo yuwen, 12:1-15. Appendix ICTCLAS Part-of-Speech Tags Tag Part-of-speech A Adjective 形 容 词 AD Adverbial adjective 副 形 词 AN Nominal adjective 名 形 词 D Adverb 副 词 E Exclamative particle 叹 词 M Numeral 数 词 N Noun 名 词 NZ Proper noun 专 有 名 词 P Preposition 介 词 Q Classifier 量 词 RY Wh-pronoun 疑 问 代 词 UDE1 De 的 VSHI Shi 是 VN Gerund 名 动 词 VYOU You 有 WJ Period 句 号 X Character 字 符 Y Modal particle 语 气 词 85
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Identifying Prepositional Phrases in Chinese Patent Texts with. Rule-based and CRF Methods
Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods Hongzheng Li and Yaohong Jin Institute of Chinese Information Processing, Beijing Normal University 19, Xinjiekou
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Customer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun [email protected] Mohamed Salah Gouider [email protected] Lamjed Ben Said [email protected] ABSTRACT
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Context Grammar and POS Tagging
Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 [email protected] [email protected]
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: [email protected])
Web Information Mining and Decision Support Platform for the Modern Service Industry
Web Information Mining and Decision Support Platform for the Modern Service Industry Binyang Li 1,2, Lanjun Zhou 2,3, Zhongyu Wei 2,3, Kam-fai Wong 2,3,4, Ruifeng Xu 5, Yunqing Xia 6 1 Dept. of Information
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Chinese Open Relation Extraction for Knowledge Acquisition
Chinese Open Relation Extraction for Knowledge Acquisition Yuen-Hsien Tseng 1, Lung-Hao Lee 1,2, Shu-Yen Lin 1, Bo-Shun Liao 1, Mei-Jun Liu 1, Hsin-Hsi Chen 2, Oren Etzioni 3, Anthony Fader 4 1 Information
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
PyCantonese: Cantonese linguistic research in the age of big data
PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
ETL Ensembles for Chunking, NER and SRL
ETL Ensembles for Chunking, NER and SRL Cícero N. dos Santos 1, Ruy L. Milidiú 2, Carlos E. M. Crestana 2, and Eraldo R. Fernandes 2,3 1 Mestrado em Informática Aplicada MIA Universidade de Fortaleza UNIFOR
TREC 2003 Question Answering Track at CAS-ICT
TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China [email protected] http://www.ict.ac.cn/
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Ana-Maria Popescu Alex Armanasu Oren Etzioni University of Washington David Ko {amp, alexarm, etzioni,
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
A Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA [email protected] Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA [email protected] Abstract
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
Automating Legal Research through Data Mining
Automating Legal Research through Data Mining M.F.M Firdhous, Faculty of Information Technology, University of Moratuwa, Moratuwa, Sri Lanka. [email protected] Abstract The term legal research generally
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA
Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Automated Extraction of Security Policies from Natural-Language Software Documents
Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,
Transition-Based Dependency Parsing with Long Distance Collocations
Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Twitter Stock Bot. John Matthew Fong The University of Texas at Austin [email protected]
Twitter Stock Bot John Matthew Fong The University of Texas at Austin [email protected] Hassaan Markhiani The University of Texas at Austin [email protected] Abstract The stock market is influenced
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
A Survey on Product Aspect Ranking Techniques
A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian
How to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
Automated Extraction of Vulnerability Information for Home Computer Security
Automated Extraction of Vulnerability Information for Home Computer Security Sachini Weerawardhana, Subhojeet Mukherjee, Indrajit Ray, and Adele Howe Computer Science Department, Colorado State University,
A Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Optimization of Internet Search based on Noun Phrases and Clustering Techniques
Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna
L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25
L130: Chapter 5d Dr. Shannon Bischoff Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25 Outline 1 Syntax 2 Clauses 3 Constituents Dr. Shannon Bischoff () L130: Chapter 5d 2 / 25 Outline Last time... Verbs...
POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches
POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ
Research on Sentiment Classification of Chinese Micro Blog Based on
Research on Sentiment Classification of Chinese Micro Blog Based on Machine Learning School of Economics and Management, Shenyang Ligong University, Shenyang, 110159, China E-mail: [email protected] Abstract
Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies
Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING
PTE Academic Preparation Course Outline
PTE Academic Preparation Course Outline August 2011 V2 Pearson Education Ltd 2011. No part of this publication may be reproduced without the prior permission of Pearson Education Ltd. Introduction The
Correlation: ELLIS. English language Learning and Instruction System. and the TOEFL. Test Of English as a Foreign Language
Correlation: English language Learning and Instruction System and the TOEFL Test Of English as a Foreign Language Structure (Grammar) A major aspect of the ability to succeed on the TOEFL examination is
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments
TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments Grzegorz Dziczkowski, Katarzyna Wegrzyn-Wolska Ecole Superieur d Ingenieurs
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database
I.J. Intelligent Systems and Applications, 2013, 12, 81-90 Published Online November 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.12.07 Semantic Analysis of Natural Language Queries
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
An online semi automated POS tagger for. Presented By: Pallav Kumar Dutta Indian Institute of Technology Guwahati
An online semi automated POS tagger for Assamese Presented By: Pallav Kumar Dutta Indian Institute of Technology Guwahati Topics Overview, Existing Taggers & Methods Basic requirements of POS Tagger Our
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
Teaching of Chinese grammars for Hungarian students and study of teaching-grammars
SHS Web of Conferences 24, 02010 (2016) DOI: 10.1051/ shsconf/20162402010 C Owned by the authors, published by EDP Sciences, 2016 Teaching of Chinese grammars for Hungarian students and study of teaching-grammars
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
SVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872
SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen
SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23
POS Def. Part of Speech POS POS L645 POS = Assigning word class information to words Dept. of Linguistics, Indiana University Fall 2009 ex: the man bought a book determiner noun verb determiner noun 1
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System Thiruumeni P G, Anand Kumar M Computational Engineering & Networking, Amrita Vishwa Vidyapeetham, Coimbatore,
Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms
Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká [email protected]
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
Improving Word-Based Predictive Text Entry with Transformation-Based Learning
Improving Word-Based Predictive Text Entry with Transformation-Based Learning David J. Brooks and Mark G. Lee School of Computer Science University of Birmingham Birmingham, B15 2TT, UK d.j.brooks, [email protected]
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software
Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working
Role of Text Mining in Business Intelligence
Role of Text Mining in Business Intelligence Palak Gupta 1, Barkha Narang 2 Abstract This paper includes the combined study of business intelligence and text mining of uncertain data. The data that is
D2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
A Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?
Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]
Cross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
Text Generation for Abstractive Summarization
Text Generation for Abstractive Summarization Pierre-Etienne Genest, Guy Lapalme RALI-DIRO Université de Montréal P.O. Box 6128, Succ. Centre-Ville Montréal, Québec Canada, H3C 3J7 {genestpe,lapalme}@iro.umontreal.ca
