Sentence Simplification and Automatic Syntactic Analysis
|
|
- Christal Francis
- 3 years ago
- Views:
Transcription
1 Evaluation of Sentence Simplification by Machine Learning and Handcrafted Rules ATraNoS WP4-15 Erik F. Tjong Kim Sang CNTS - Language Technology Group University of Antwerp erikt@uia.ua.ac.be Abstract This progress report describes the Antwerp approach to sentence simplification in the framework of the ATraNoS project with a focus on the evaluation of the sentence simplification systems we have built. We describe earlier work in sentence simplification and explain our two approaches to sentence reduction: one based on machine learning while the other one relies on handcrafted deletion rules. We compare the performances of the systems and then discuss the applied evaluation method 1. 1 Introduction There exists a large body of work on text summarization but the largest part of this concerns extraction approaches in which a subset of the sentences in a text is proposed as a summary of the text. Most of the work on sentence simplification (also called sentence reduction, sentence condensation and sentence compression) relies on an automatic syntactic analysis after which handcrafted rules, sometimes with learned probabilities, remove optional branches in the syntactic trees. For evaluation, most systems rely on human judges which examine the grammaticality and the information content of the reduced sentence although some authors also propose automatic evaluation methods which are shown to correlate with the decisions made by the judges. Knight and Marcu (2000) compare two machine learners, a noisy channel model and decision trees, on compressing sentences by learning to map syntactic trees to other trees. After training on a corpus of about 1000 sentences, both systems performed significantly better than a baseline based on word bigram occurrences, according to a blind evaluation by human judges. Hori et al. (2002) compute various word and word relation scores, and build a compressed spoken sentence by finding an optimal set of words with dynamic programming. 2004). 1 This report is part of a paper submitted for the proceedings of CLIN-2003 (Tjong Kim Sang et al., 1
2 for each pair of transcriptions and subtitles remove punctuation and capitalization link every word to copies and synonyms remove duplicate links based on longest chains remove remaining crossing links link multi-word synonyms link neighboring unlinked words Table 1: High-level pseudo-code of the word alignment algorithm They show that this approach performs better than randomly choosing words from the original sentence, according to an automatic evaluation. Most other work on sentence condensation involves (shallow) parsing, sometimes with additional lexical and morphological tools, together with rule-based reduction, occasionally with statistical scores like tf idf (Carroll et al., 1998; Chandrasekar and Srinivas, 1997; Grefenstette, 1998; Jing, 2000; Kiyota and Kurohashi, 2001; Riezler et al., 2003). An interesting approach to sentence compression is by paraphrasing, that is replacing phrases by shorter phrases with the same meaning. Shinyama et al. (2002) outline how paraphrases can be derived automatically from related documents. The application of sentence simplification for automatically generating subtitles for TV programs is mentioned in Robert-Ribes et al. (1999) but to our knowledge there has been no practical study linking the two. 2 Data We have defined sentence simplification as a word classification task. In order to convert a transcribed sentence to a subtitle, four basic word actions can be taken: word copy, word deletion, word replacement and word insertion. In comparison with a subtitle, the words from the transcribed sentence can be copied, deleted or replaced by another word. Furthermore, words that do not appear in the transcription can be inserted in the subtitle. Since the subtitles are often quite similar to the transcription, the word copy action is the most frequent action. Here is an example from the corpus: de politici vinden de euro natuurlijk/delete een goeie/goede zaak In this sentence, two words need to be changed. The adverb natuurlijk needs to be deleted and the adjective goeie needs to be replaced by goede. In order to find out what word actions are required to transform the transcription to the subtitles, the sentence-level alignment of the parallel ATraNoS corpus (WP4-03) is insufficient. We need alignment at word level. Automatically retrieving word-to-word links is more difficult than obtaining sentence alignments. Subtitle words may occur in a different position than in the transcription and this makes it hard to match corresponding words. Generating a rough alignment and correcting this manually requires too much work. 2
3 sentences words copied deleted replaced CR train 9, ,519 93,506 22,986 9, % development 1,345 15,577 11,660 2,767 1, % test 1,314 15,605 11,737 2,836 1, % Table 2: Size of the three parts of the selected VRT news broadcast data measured in number of sentences and number of words. The final four columns show the relation between the transcriptions and the subtitles: the number of copied words (75%), the number of deleted words (18%), the number of replaced words (7%) and the average character compression rate: the percentage of remaining characters in the subtitles. In order to get some useful material, we have performed an automatic word alignment and discarded all sentences in which the alignment could be suspected to have been unsuccessful. Word alignment started with removing all punctuation signs and converting capital characters to lower case, thus simulating the future input of automatically transcribed spoken text. This cleanup action also made word alignment easier because there were inconsistencies in the punctuation and capitalization between the transcriptions and the subtitles. An overview of the other actions performed by the word alignment algorithm can be found in Table 1. After performing the word alignment, we have selected sentence pairs from the news corpus that contained sentences which shared at least half of their words or which contained at most three different words. This restriction was chosen in order to avoid including pairs in the training data that were difficult to learn from. Pairs of copied sentences have also been excluded from the material since those were uninteresting from a sentence simplification point of view. Of the parallel corpus, we have only used texts of the VRT news part because we wanted to create training material with a more-or-less consistent content. The data selection method has resulted in a collection of 12,535 sentences (156,701 words). We have divided this in a training part (9,876 sentences, 125,519 words), a development part (1,345 sentences, 15,577 words) and a test part (1,314 sentences, 15,605 words, see Table 2). Each word in the three data sets was specified by 48 features: the six features word, lemma, part-of-speech tag, chunk tag, relation tag and person name tag for the word and its six nearest neighbors as well the classes of the six nearest neighbors (estimated for the development and test data). Person name tags have been generated by a basic named entity tagger. For each sentence we have computed the compression rate: the number of characters in the subtitles divided by those in the transcripts. This compression rate will be a target for the machine learners. 3 Methods We have applied four different methods to the data. The first is a baseline system which simply generated the most frequent output class for each word. The second is the memorybased tagger MBT (Daelemans et al., 2003a). The third is the memory-based learner 3
4 TiMBL (Daelemans et al., 2003b) and the fourth is a list of hand-crafted deletion rules. In this section we will describe the third and fourth method in more detail. Memory-based learning involves storing training data items and assigning classes to test items which correspond with training items that are similar. There are different memorybased learning algorithms and different ways for computing item similarity. We have used the default settings of TiMBL: the ib1 algorithm with gain ratio feature weighting (Daelemans et al., 2003b). TiMBL has an important advantage over MBT: it allows data items with more than one feature while MBT is restricted to items with one feature. We have applied three extensions to the memory-based learner. The first is feature selection which we have implemented with bidirectional hill-climbing (Caruana and Freitag, 1994). It starts with evaluating each individual feature and continues with evaluating all pairs containing the best individual. This process is continued until adding or removing a single feature to the current set does not result in an increased performance. Feature selection is necessary because the chosen machine learner is sensitive to feature choice: it might perform better with a subset of features than with the complete set. The second extension to the memory-based learner which we employed was classifier stacking. We want to use of the classes of the neighbor words as features for the current word. In order to obtain these we ran preliminary experiments and used the generated output classes as context class features in the next experiments. The third extension was the use of trigram output classes (Van den Bosch et al., 2004). Rather than predicting the action for the current word, we predict the previous, current and next action. This allows the neighboring words to take part in deciding what action to take for the current word and this improves performance. The decision which action to perform is made by a majority vote in which ties are solved by choosing the action predicted by the focus word. The approach of the baseline and the two machine learners is to find the best action classes for each word in one step. The sentence simplification method based on hand-crafted deletion processes the data in two steps. First, it finds all words and phrases in the data that are candidates for deletion and then it selects phrases for deletion until the required compression rate is met. We believe that the first step could have been made with a learner as well but until now we have not been able to define a learning task for deleting all optional material from a sentence (note that we only have training examples in which a small part is deleted). We have defined different deletion rules, among others for removing adverbs, adjectives, first names, prepositional phrases, phrases between commas or brackets, relative clauses, numbers and time phrases. In order to avoid tuning the rules on the test data, we have written them based on other news text (the Dutch NOS Teletext). As an example, we show a simplified version rule for adverb deletion: if (currentpos == BW and not currentword is sentencefinal and not previousverb in baseverbs and not currentword in negations) then mark currentword as deletion candidate 4
5 average CR CR met Precision Recall F β=1 Baseline 94.9% 22.0% 51.0% 10.2% 17.0 MBT 90.3% 33.2% 38.9% 16.7% 23.4 TiMBL 86.7% 38.2% 32.4% 20.2% 24.9 Rules 69.4% 87.7% 26.0% 25.8% 25.9 TiMBL+Rules 70.6% 81.1% 25.7% 28.5% 27.0 Table 3: Sentence reduction performances of the memory-based tagger MBT, the memorybased learner TiMBL, the hand-crafted deletion rules and a combination of TiMBL and the rules, measured on the test data set by the average percentage of remaining characters (compression rate: CR), target: 81.8%), the percentage of sentences for which this compression rate was met, and precision, recall and F β=1 computed for word deletions and word replacements. The baseline system predicts the most frequent action for each word. After identifying all candidates for deletion, a selection of these phrases will be removed. The selection process starts with removing the shortest phrases, measured in number of words. If phrases have the same length, preference is given to phrases closer to the end of the sentence. The selection process continues until the required compression rate is met or until there are no more phrases left to delete. 4 Experiments and results We have performed feature selection for both MBT and TiMBL while training with the available training data and testing with the development data. Words, lemmas, part-ofspeech tags and chunk tags were selected for the memory-based learner but the relation information, person name information as well as the classes of the neighboring words were deemed unuseful. Performance was measured in F β=1 rate, the harmonic mean of precision and recall, for deletion and replacement actions. The feature sets corresponding with the best performances, wwfwaa,hnfaa for MBT and w 2 2 l 1 1 p 2 3 c 0 for TiMBL, have been used for processing the test data. An overview of the performances on this data set can be found in Table 3. As can be seen from the low baseline F β=1 rate, this is a difficult task. All approaches performed significantly better than the baseline with respect to F β=1 rates (p<0.01 according to a bootstrap sampling test (Noreen, 1989)). TiMBL was significantly better than MBT (p<0.05). The hand-crafted deletion rules did better than TiMBL, both with respect to the F β=1 rate (not significant, p 0.15) and the percentage of sentences that met the required compression rate (significant, p<0.01). Note that contrary to the deletion rules, neither MBT nor TiMBL had access to the required compression rates. The fact that the hand-crafted rules outperformed the two learners is interesting since the rules perform only part of the task (word deletion) while the learners also predict word replacements. We have created a basic combined system that chooses the deletions predicted by the rules and the replacements predicted by TiMBL. This system obtained a slightly 5
6 higher F β=1 rate than the handcrafted deletion rules (p 0.10, Table 3). In comparison with the rules, the combined system did not compress as many sentences sufficiently. In some cases word replacement introduced longer words and no extra post-processing has been performed to undo the effects of this. 5 Evaluation We have chosen to evaluate the output of the learners by examining the deleted and replaced words and comparing them with a gold standard. This will not always be fair for the learner, as the following example shows: O: in afwachting komt er nieuw overleg en met de landbouw en met de milieubeschermers en in de regering G: in afwachting komt er overleg met landbouw en milieubeschermers en in de regering R: er komt overleg en met de landbouw en met de milieubeschermers en in de regering The first sentence (O) is the original transcribed sentence, with the target words for deletion marked in italics, the second is the associated subtitle (Gold) and the third is the output of the rule-based system (R). The system output is fine: it satisfies the required compression rate and it is grammatically sound. Yet, only one of the three word deletions proposed by the system occurs in the subtitle (precision: 33%) while it suggests only one of the five deletions in the subtitle (recall: 20%; F β=1 : 25). According to the evaluation measure the suggested simplification is not very good. An excellent solution for this problem would be a manual evaluation of the system output. This evaluation can address different issues: whether the compression rate has been met, whether simplified sentence still contains the same information content as the original one and whether the simplification process has resulted in a grammatical sentence. However, in the process of system development we generate many different analyses of the development data and a repeated full manual evaluation of the results would be too costly. At this moment we are working on an intermediate evaluation process which uses several simplifications targets for each source sentence (Papineni et al., 2002). This requires extra manual work when building the test corpus but it will increase the probability of correctly assessing the performance of the learner. For this particular report, we have performed a manual evaluation of the output of two systems: the general memory-based learner TiMBL and the handcrafted deletion rules. We examined the first 200 reduced sentences they produced for the test data. Results were scored on a three-level scale (1.0: good, 0.5: medium, 0.0: bad) for syntactic and semantic correctness. Averages for both scales have been computed. A problem with this type of evaluation is that a passive approach to sentence reduction will result in high performance 6
7 all CR not met CR met nbr syn sem nbr syn sem nbr syn sem TiMBL % 70% % 78% 70 54% 56% Rules % 64% 26 89% 81% % 61% Table 4: Results of a manual evaluation of the output for the first 200 sentences of the test data. The output of the systems was evaluated with respect to syntactic correctness (syn) and semantic correctness (sem). Separate overviews are given for all sentences, sentences for which the required compression rate (CR) was not met and sentences for which this rate was met. rates with respect to the linguistic correctness of the output: by copying all words of the sentence we would get the original sentence which is expected to be grammatically correct. Therefore we have registered average performance rates for all sentences as well as for sentences for which the required compression rate was met and for sentences for which this was not the case. An overview of the results can be found in Table 4. When we compare the quality of the output sentences that met the required compression rate (final three columns of Table 4), we see that the handcrafted deletion rules produced better quality output, both with respect to syntactic correctness (p<0.01) and semantic correctness (p 0.15). However, the obtained performance rates, around 65%, are not spectacular. The evaluation performed in this report presupposes applying the systems in a stand-alone set-up without human post-processing. A more reasonable evaluation would be to test the systems in combination with human post-processing and measure the time gains which they provide in comparison with working without them. 6 Concluding remarks and future work We have presented work on sentence simplification targeted at the automatic generation of Dutch TV subtitles. A large part of the work was based on automatic learning from examples. From our training corpora, we have used the most interesting pairs of sentences from the corpus: those that were different but shared enough words to enable the learner to successfully reduce them. We have applied a memory-based tagger (MBT), a general memory-based learner (TiMBL) and a set of hand-crafted deletion rule to this data. The general learner outperformed the tagger and the rules did even better. A combination of the general learner (word replacements) and the rules (word deletion) achieved the best score on the data. However, we have subsequently shown that our evaluation procedure is not always fair to the systems. A manual evaluation of the output of two systems revealed that the quality of the rule-based system output was better than that of the general memory-based system. The quality figures were not very high (around 65%) and an additional processing time gain evaluation would be interesting. The compression rates obtained by the two learners (Table 3, column CR met) show that it is not a good idea to define sentence simplification as a single-stage task which can be 7
8 learned from examples. Contrary to the natural language processing tasks which we have worked on before, sentence simplification is more dynamic because in different contexts different compression rates might be required for the same sentence. Taking a corpus of sentence simplification output from humans, only gives examples of word actions which can be performed to reduce a sentence and not what actions must be performed. In a corpus with examples of low compression, the number of copied examples of a phrase will of outnumber the number of reduced versions. In those cases the learner chooses for the most frequent alternative which explains why they do not reach the required compression rates. We believe that the two-stage method which we have used in rule-based approach is better suited for sentence simplification: first find all candidate phrases for deletion and then choose a subset of these in such a way that the required compression rate is met. Ideally, the first step would have been performed by a machine learner. However, this requires a different parallel corpus of examples than the one we have used for our work: all deletion candidates need to be labeled rather than some. Additionally the corpus needs to contain a specification of links between the candidate phrases: some phrases in a sentence cannot be deleted without removing the phrases that depend on them as well. Constructing such a corpus as well as designing the associated learner is an interesting topic for future work. Another promising task that will lead to better sentence reduction is improving the applied shallow parser. Currently it generates only one relation, the verb-subject relation. It would be very useful would the parser identify a large range of verb-related phrases: agents, patients and modifiers of different types, like those containing time and space specifications. Such a shallow parser would allow learning sentence simplification from non-parallel corpora: from the relations present in the corpus the learner would discover which types of phrases are compulsory for a verb and which are optional. References Carroll, J., Minnen, G., Canning, Y., Devlin, S., and Tait, J. (1998). Practical Simplification of English Newspaper Text to Assist Aphasic Readers. In Proceedings of the AAAI-98 Workshop on Integrating AI and Assistive Technology. Madison, WI, USA. Caruana, R. and Freitag, D. (1994). Greedy Attribute Selection. In Proceedings of the Eleventh International Conference on Machine Learning, pages New Brunswick, NJ, USA, Morgan Kaufman. Chandrasekar, R. and Srinivas, B. (1997). Automatic Induction of Rules for Text Simplification. Knowledge-Based Systems, 10: Daelemans, W., Zavrel, J., Van der Sloot, K., and Van den Bosch, A. (2003a). MBT: Memory Based Tagger, version 2.0, Reference Guide. ILK Technical Report ILK Daelemans, W., Zavrel, J., Van der Sloot, K., and Van den Bosch, A. (2003b). TiMBL: Tilburg Memory Based Learner, version 5.0, Reference Guide. ILK Technical Report ILK Grefenstette, G. (1998). Producing Intelligent Telegraphic Text Reduction to Provide an Audio Scanning Service for the Blind. In Intelligent Text Summarization, AAAI Spring Symposium Series, pages
9 Hori, C., Furui, S., Malkin, R., Yu, H., and Waibel, A. (2002). Automatic Speech Summarization Applied to English Broadcast News Speech. In Proceedings of ICASSP 2002, pages Orlando, FL, USA. Jing, H. (2000). Sentence Reduction for Automatic Text Summarization. In Proceedings of ANLP- 2000, pages Seattle WA. Kiyota, Y. and Kurohashi, S. (2001). Automatic Summarization of Japanese Sentences and its Application to a WWW KWIC Index. In Proceedings of SAINT 2001, pages San Diego, CA, USA. Knight, K. and Marcu, D. (2000). Statistics-based Summarization Step One: Sentence Compression. In Proceedings of AAAI Austin TX. Noreen, E. W. (1989). Computer-Intensive Methods for Testing Hypotheses. John Wiley & Sons. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages Riezler, S., King, T. H., Crouch, R., and Zaenen, A. (2003). Statistical Sentence Condensation Using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-functional Grammar. In Proceedings of HLT-NAACL 2003, pages Edmonton, Canada. Robert-Ribes, J., Pfeiffer, S., Ellison, R., and Burnham, D. (1999). Semi-automatic Captioning of TV Programs, an Australian Perspective. In Proceedings of TAO Workshop on TV Closed Captions for the Hearing-impaired People, pages Tokyo, Japan. Shinyama, Y., Sekine, S., Sudo, K., and Grishman, R. (2002). Automatic Paraphrase Acquisition from News Articles. In Proceedings of HLT San Diego, CA, USA. Tjong Kim Sang, E. F., Daelemans, W., and Höthker, A. (2004). Reduction of Dutch Sentences for Automatic Subtitling. Van den Bosch, A., Canisius, S., Daelemans, W., Hendrickx, I., and Tjong Kim Sang, E. (2004). Memory-based Semantic Role Labeling: Optimizing Features, Algorithm, and Output. In Proceedings of CoNLL Boston, MA, USA. 9
10 A Examples This section contains the first 15 sentences of the test data. For every sentence we show the original version (O), the gold standard simplified version (G), the simplified version generated by the memory-based learner TiMBL (T) and the simplified version produced by the hand-crafted deletion rules (R). Both the input and the output of the systems are lower-case sentences without punctuation signs. O: en er komt ook overleg met alle betrokken partijen G: er komt overleg met alle partijen T: en er komt ook overleg met alle betrokken partijen R: en er komt ook overleg O: dat lijkt goed nieuws voor de bezorgde boeren G: dat lijkt goed nieuws voor de boeren T: dat is goed nieuws voor de bezorgde boeren R: dat lijkt goed nieuws voor de boeren O: als dat meer dan 54 % is dan is dat zo als dat objectief kan vastgesteld worden G: als dat meer dan 54 % is dan is dat zo T: als dat meer dan 54 % is is dat zo als dat objectief kan vastgesteld worden R: als dat meer dan 54 % is dan is O: ik heb het gevoel dat er een beetje een symbooldiscussie is G: ik heb het gevoel dat er een symbooldiscussie is T: in heb het gevoel dat er een symbooldiscussie R: ik heb het gevoel O: en dat misschien bepaalde organisaties zeggen het moet zeker meer zijn dan 50 % want anders is het niet goed G: bepaalde organisaties zeggen het moet zeker meer dan 50 % zijn anders is het niet goed T: dat misschien bepaalde organisaties zeggen je moet zeker meer zijn dan 50 procent anders niet goed R: en dat organisaties zeggen het moet zeker meer zijn dan 50 % want is het niet goed O: voor mij kan dat zelfs 60 % zijn als dat tenminste wetenschappelijk onderbouwd is G: het mag 60 % zijn als het maar wetenschappelijk onderbouwd is T: voor mij kan dat zelfs 60 % zijn als dat tenminste onderbouwd is R: voor mij kan dat 60 % zijn als dat onderbouwd is O: in afwachting komt er nieuw overleg en met de landbouw en met de milieubeschermers en in de regering G: in afwachting komt er overleg met landbouw en milieubeschermers en in de regering T: in afwachting komt er nieuw overleg en met de landbouw en met de arbeid en in de regering R: er komt overleg en met de landbouw en met de milieubeschermers en in de regering 10
11 O: vanaf maandag wil ik eigenlijk met alle betrokken ministers met de ganse regering opnieuw rond de tafel om opnieuw die synthese te kunnen maken G: vanaf maandag wil ik met de ganse regering opnieuw rond de tafel om daar die synthese te kunnen maken T: vanaf maandag wil ik eigenlijk met alle betrokken ministers met de ganse regering opnieuw rond de tafel om opnieuw die synthese te kunnen komen R: ik wil eigenlijk met alle betrokken ministers met de ganse regering om die synthese te kunnen maken O: de hele discussie binnen de vlaamse regering heeft dus te maken met het europese milieubeleid G: de hele discussie heeft dus te maken met het europese milieubeleid T: de hele discussie binnen de vlaamse regering heeft te maken met het europese milieubeleid R: de hele discussie binnen de vlaamse regering heeft dus te maken O: lieven verstraete zocht uit wat europa nu eigenlijk wil G: lieven verstraete zocht uit wat europa wil T: lieven verstraete zocht uit wat zij wil R: lieven verstraete zocht uit wat europa wil O: tegen begin volgend jaar mag het grondwater nog maar 50 miligram nitraat per liter bevatten G: tegen 2003 mag het grondwater maar 50 milligram nitraat per liter bevatten T: tegen begin 2003 mag het grondwater nog maar 50 miligram nitraat per liter bevatten R: tegen begin volgend jaar mag het grondwater nog 50 miligram nitraat bevatten O: en de vlaamse landbouw met zijn uitgebreide veestapel is dus een belangrijke vervuiler G: de vlaamse landbouw met zijn grote veestapel is dus een belangrijke vervuiler T: de vlaamse landbouw met zijn uitgebreide veestapel is een belangrijke vervuiler R: en de vlaamse landbouw met zijn uitgebreide veestapel is dus een vervuiler O: om de nitraatnorm te halen kan een land kwetsbare gebieden afbakenen G: om de nitraat-norm te halen kan een land kwetsbare gebieden afbakenen T: om de nitraatnorm te halen kan kwetsbare gebieden afbakenen R: om de nitraatnorm te halen kan een land gebieden afbakenen O: daar mag dan 35 procent minder mest uitgereden worden dan elders G: daar mag dan 35 % minder mest uitgereden worden dan elders T: daar mag dan 35 % minder mest uitgereden worden dan elders R: daar mag 35 procent minder mest uitgereden worden dan elders O: tijdwinst dus G: tijdswinst dus T: tijdwinst R: tijdwinst dus 11
Reduction of Dutch Sentences for Automatic Subtitling
Reduction of Dutch Sentences for Automatic Subtitling Erik F. Tjong Kim Sang, Walter Daelemans and Anja Höthker CNTS Language Technology Group, University of Antwerp, Belgium Abstract We compare machine
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationDetecting semantic overlap: Announcing a Parallel Monolingual Treebank for Dutch
Detecting semantic overlap: Announcing a Parallel Monolingual Treebank for Dutch Erwin Marsi & Emiel Krahmer CLIN 2007, Nijmegen What is semantic overlap? Zangeres Christina Aguilera heeft eindelijk verteld
More informationThe University of Amsterdam s Question Answering System at QA@CLEF 2007
The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationAnnotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
More informationTimeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?
Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]
More informationHow To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme)
The STEVIN IRME Project Jan Odijk STEVIN Midterm Workshop Rotterdam, June 27, 2008 IRME Identification and lexical Representation of Multiword Expressions (MWEs) Participants: Uil-OTS, Utrecht Nicole Grégoire,
More informationComma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect
More informationBrill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
More informationExtraction of Hypernymy Information from Text
Extraction of Hypernymy Information from Text Erik Tjong Kim Sang, Katja Hofmann and Maarten de Rijke Abstract We present the results of three different studies in extracting hypernymy information from
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationDaan & Rembrandt Research Wendelien Daan By Willemijn Jongbloed Group D October 2009
Daan & Rembrandt Research Wendelien Daan By Willemijn Jongbloed Group D October 2009 Doing Dutch Often, the work of many Dutch artist; photographers, designers, and painters is easy to recognize among
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationCoreference Resolution on Blogs and Commented News
Coreference Resolution on Blogs and Commented News Iris Hendrickx 1 and Veronique Hoste 1,2 1 LT3 - Language and Translation Technology Team, University College Ghent, Groot-Brittanniëlaan 45, Ghent, Belgium
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationPoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
More information3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp
More informationSAND: Relation between the Database and Printed Maps
SAND: Relation between the Database and Printed Maps Erik Tjong Kim Sang Meertens Institute erik.tjong.kim.sang@meertens.knaw.nl May 16, 2014 1 Introduction SAND, the Syntactic Atlas of the Dutch Dialects,
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationLinguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10
Linguistic Research with CLARIN Jan Odijk MA Rotation Utrecht, 2015-11-10 1 Overview Introduction Search in Corpora and Lexicons Search in PoS-tagged Corpus Search for grammatical relations Search for
More informationCLARIN project DiscAn :
CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora Ted Sanders Kirsten Vis Utrecht Institute of Linguistics Utrecht University Daan Broeder TLA Max-Planck Institute
More informationFinding Syntactic Characteristics of Surinamese Dutch
Finding Syntactic Characteristics of Surinamese Dutch Erik Tjong Kim Sang Meertens Institute erikt(at)xs4all.nl June 13, 2014 1 Introduction Surinamese Dutch is a variant of Dutch spoken in Suriname, a
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More information~ We are all goddesses, the only problem is that we forget that when we grow up ~
~ We are all goddesses, the only problem is that we forget that when we grow up ~ This brochure is Deze brochure is in in English and Dutch het Engels en Nederlands Come and re-discover your divine self
More informationA chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
More informationMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationNatural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
More informationLeveraging ASEAN Economic Community through Language Translation Services
Leveraging ASEAN Economic Community through Language Translation Services Hammam Riza Center for Information and Communication Technology Agency for the Assessment and Application of Technology (BPPT)
More informationRelationele Databases 2002/2003
1 Relationele Databases 2002/2003 Hoorcollege 5 22 mei 2003 Jaap Kamps & Maarten de Rijke April Juli 2003 Plan voor Vandaag Praktische dingen 3.8, 3.9, 3.10, 4.1, 4.4 en 4.5 SQL Aantekeningen 3 Meer Queries.
More informationLive subtitling with speech recognition: Causes and consequences of revisions in the production process
Live subtitling with speech recognition: Causes and consequences of revisions in the production process Luuk Van Waes, Mariëlle Leijten & Aline Remael Master in de Meertalige Professionele Communicatie
More informationParsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software
More informationCINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
More informationTitle page. Title: A conversation-analytic study of time-outs in the Dutch national volleyball. competition. Author: Maarten Breyten van der Meulen
Title page Title: A conversation-analytic study of time-outs in the Dutch national volleyball competition Author: Maarten Breyten van der Meulen A conversation-analytic study of time-outs in the Dutch
More informationIP-NBM. Copyright Capgemini 2012. All Rights Reserved
IP-NBM 1 De bescheidenheid van een schaker 2 Maar wat betekent dat nu 3 De drie elementen richting onsterfelijkheid Genomics Artifical Intelligence (nano)robotics 4 De impact van automatisering en robotisering
More informationContext Grammar and POS Tagging
Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 dick.chen@lexisnexis.com don.loritz@lexisnexis.com
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationLASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH
LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH Gertjan van Noord Deliverable 3-4: Report Annotation of Lassy Small 1 1 Background Lassy Small is the Lassy corpus in which the syntactic annotations
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationLearning Morphological Disambiguation Rules for Turkish
Learning Morphological Disambiguation Rules for Turkish Deniz Yuret Dept. of Computer Engineering Koç University İstanbul, Turkey dyuret@ku.edu.tr Ferhan Türe Dept. of Computer Engineering Koç University
More informationNamed Entity Recognition in Broadcast News Using Similar Written Texts
Named Entity Recognition in Broadcast News Using Similar Written Texts Niraj Shrestha Ivan Vulić KU Leuven, Belgium KU Leuven, Belgium niraj.shrestha@cs.kuleuven.be ivan.vulic@@cs.kuleuven.be Abstract
More informationExample-based Translation without Parallel Corpora: First experiments on a prototype
-based Translation without Parallel Corpora: First experiments on a prototype Vincent Vandeghinste, Peter Dirix and Ineke Schuurman Centre for Computational Linguistics Katholieke Universiteit Leuven Maria
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationHybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationTransition-Based Dependency Parsing with Long Distance Collocations
Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationAppraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
More informationTopics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
More informationManagement control in creative firms
Management control in creative firms Nathalie Beckers Lessius KU Leuven Martine Cools Lessius KU Leuven- Rotterdam School of Management Alexandra Van den Abbeele KU Leuven Leerstoel Tindemans - February
More informationHybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationPoliticalMashup. Make implicit structure and information explicit. Content
1 2 Content Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Overview project Zooming in on one cultural heritage dataset A few
More informationThe information in this report is confidential. So keep this report in a safe place!
Bram Voorbeeld About this Bridge 360 report 2 CONTENT About this Bridge 360 report... 2 Introduction to the Bridge 360... 3 About the Bridge 360 Profile...4 Bridge Behaviour Profile-Directing...6 Bridge
More informationOGH: : 11g in de praktijk
OGH: : 11g in de praktijk Real Application Testing SPREKER : E-MAIL : PATRICK MUNNE PMUNNE@TRANSFER-SOLUTIONS.COM DATUM : 14-09-2010 WWW.TRANSFER-SOLUTIONS.COM Real Application Testing Uitleg Real Application
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationAnnotation and Evaluation of Swedish Multiword Named Entities
Annotation and Evaluation of Swedish Multiword Named Entities DIMITRIOS KOKKINAKIS Department of Swedish, the Swedish Language Bank University of Gothenburg Sweden dimitrios.kokkinakis@svenska.gu.se Introduction
More informationThe TCH Machine Translation System for IWSLT 2008
The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationA Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract
More informationFactored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
More informationLEXICOGRAPHIC ISSUES IN COMBINATORICS
LEXICOGRAPHIC ISSUES IN COMBINATORICS Editing multiple idiomatic phraseological units for book and CD-ROM Lineke OPPENTOCHT, Utrecht, The Netherlands Abstract In dictionaries dating from pre-computerized
More informationAnalysis and Synthesis of Help-desk Responses
Analysis and Synthesis of Help-desk s Yuval Marom and Ingrid Zukerman School of Computer Science and Software Engineering Monash University Clayton, VICTORIA 3800, AUSTRALIA {yuvalm,ingrid}@csse.monash.edu.au
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationPOS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches
POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ
More information101 Inspirerende Quotes - Eelco de boer - winst.nl/ebooks/ Inleiding
Inleiding Elke keer dat ik een uitspraak of een inspirerende quote hoor of zie dan noteer ik deze (iets wat ik iedereen aanraad om te doen). Ik ben hier even doorheen gegaan en het leek me een leuk idee
More informationETL Ensembles for Chunking, NER and SRL
ETL Ensembles for Chunking, NER and SRL Cícero N. dos Santos 1, Ruy L. Milidiú 2, Carlos E. M. Crestana 2, and Eraldo R. Fernandes 2,3 1 Mestrado em Informática Aplicada MIA Universidade de Fortaleza UNIFOR
More informationSystematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
More informationDownload Check My Words from: http://mywords.ust.hk/cmw/
Grammar Checking Press the button on the Check My Words toolbar to see what common errors learners make with a word and to see all members of the word family. Press the Check button to check for common
More informationBiology Based Alignments of Paraphrases for Sentence Compression
Biology Based Alignments of Paraphrases for Sentence Compression João Cordeiro CLT and Bioinformatics University of Beira Interior Covilhã, Portugal jpaulo@di.ubi.pt Gäel Dias CLT and Bioinformatics University
More informationModern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Ana-Maria Popescu Alex Armanasu Oren Etzioni University of Washington David Ko {amp, alexarm, etzioni,
More informationText Generation for Abstractive Summarization
Text Generation for Abstractive Summarization Pierre-Etienne Genest, Guy Lapalme RALI-DIRO Université de Montréal P.O. Box 6128, Succ. Centre-Ville Montréal, Québec Canada, H3C 3J7 {genestpe,lapalme}@iro.umontreal.ca
More informationemployager 1.0 design challenge
employager 1.0 design challenge a voyage to employ(ment) EMPLOYAGER 1.0 On the initiative of the City of Eindhoven, the Red Bluejay Foundation organizes a new design challenge around people with a distance
More informationA Flexible Online Server for Machine Translation Evaluation
A Flexible Online Server for Machine Translation Evaluation Matthias Eck, Stephan Vogel, and Alex Waibel InterACT Research Carnegie Mellon University Pittsburgh, PA, 15213, USA {matteck, vogel, waibel}@cs.cmu.edu
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationHow To Design A 3D Model In A Computer Program
Concept Design Gert Landheer Mark van den Brink Koen van Boerdonk Content Richness of Data Concept Design Fast creation of rich data which eventually can be used to create a final model Creo Product Family
More informationEnglish Grammar Checker
International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,
More informationUsing the BNC to create and develop educational materials and a website for learners of English
Using the BNC to create and develop educational materials and a website for learners of English Danny Minn a, Hiroshi Sano b, Marie Ino b and Takahiro Nakamura c a Kitakyushu University b Tokyo University
More informationShallow Parsing with PoS Taggers and Linguistic Features
Journal of Machine Learning Research 2 (2002) 639 668 Submitted 9/01; Published 3/02 Shallow Parsing with PoS Taggers and Linguistic Features Beáta Megyesi Centre for Speech Technology (CTT) Department
More informationStructural and Semantic Indexing for Supporting Creation of Multilingual Web Pages
Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages Hiroshi URAE, Taro TEZUKA, Fuminori KIMURA, and Akira MAEDA Abstract Translating webpages by machine translation is the
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationText Opinion Mining to Analyze News for Stock Market Prediction
Int. J. Advance. Soft Comput. Appl., Vol. 6, No. 1, March 2014 ISSN 2074-8523; Copyright SCRG Publication, 2014 Text Opinion Mining to Analyze News for Stock Market Prediction Yoosin Kim 1, Seung Ryul
More informationEEN HUIS BESTUREN ALS EEN FABRIEK,
EEN HUIS BESTUREN ALS EEN FABRIEK, HOE DOE JE DAT? Henk Akkermans World Class Maintenance & Tilburg University Lezing HomeLab 2050, KIVI, 6 oktober, 2015 The opportunity: an industrial revolution is happening
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationVerticale tuin maken van een pallet
Daar krijg je dan je groendakmaterialen geleverd op een pallet. Waar moet je naar toe met zo'n pallet. Retour brengen? Gebruiken in de open haard? Maar je kunt creatiever zijn met pallets. Zo kun je er
More informationCross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
More informationThe Importance of Collaboration
Welkom in de wereld van EDI en de zakelijke kansen op langer termijn Sectorsessie mode 23 maart 2016 ISRID VAN GEUNS IS WORKS IS BOUTIQUES Let s get connected! Triumph Without EDI Triumph Let s get connected
More informationA Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation
A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation Shiyan Ou, Christopher S.G. Khoo, and Dion H. Goh Division of Information Studies School
More informationSpecification by Example (methoden, technieken en tools) Remco Snelders Product owner & Business analyst
Specification by Example (methoden, technieken en tools) Remco Snelders Product owner & Business analyst Terminologie Specification by Example (SBE) Acceptance Test Driven Development (ATDD) Behaviour
More informationTowards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
More informationPredicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More information7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation
More information