Text Analysis beyond Keyword Spotting

Size: px
Start display at page:

Download "Text Analysis beyond Keyword Spotting"

Transcription

1 Text Analysis beyond Keyword Spotting Bastian Haarmann, Lukas Sikorski, Ulrich Schade { bastian.haarmann lukas.sikorski ulrich.schade }@fkie.fraunhofer.de Fraunhofer Institute for Communication, Information Processing, and Ergonomics (FKIE) Neuenahrer Straße Wachtberg GERMANY ABSTRACT Texts, e.g. written reports, can be pre-analyzed automatically in order to find those texts which might be of interest with respect to a specific problem. Often this is done by a more or less sophisticated version of keyword spotting. However, the field of computational linguistics offers more potent and more precise tools for automatic text analysis. In this paper we present such a system tailored to the pre-analysis of military text as reports. We discuss the modules of the system and the improvements we installed to meet the needs of the military like the need for speed. As a conclusion we show how that system can be used in the process of automatic threat recognition. MOTIVATION The success of military operations of all kinds (battlefield, anti-terrorism, peacekeeping, disaster relief) relies on information. Commanders must be aware of the current situation, they have to understand it, and they have to grasp the sense of how the situation might develop [20]. In the past, critical pieces of information often were not available while today they are often hidden in the haystack of gathered sensor data, SIGINT data, HUMINT reports, and open sources. The huge amount of these data can no longer be processed by human reconnaissance specialists in toto. It has to be reduced by automatic means. Systems have to check for information pieces that may be relevant to tackle a specific problem. Then, the human experts can analyse the promising information pieces to find the pearls sought after. With respect to texts, from reports written by members of one s own forces to web pages of possible opponents, the automatic pre-processing often is limited to keyword spotting. However, the field of 1

2 computational linguistics offers more potent and more precise tools for automatic text analysis. This paper is about a system that applies these tools and methods. INFORMATION EXTRACTION The process of extracting limited kinds of semantic content from texts is called information extraction (IE) [12, p. 759]. Our work in that field is conceptually based on Hecking (e.g., [8, 9]) who applied IE techniques to the analysis of battlefield and HUMINT reports, the domains we also aim at. The technical base for our work is the freely available open-source tool GATE [2, 7]. GATE provides a toolbox to build the required IE processing pipeline which then can be adjusted and enlarged according to one s needs. The standard IE processing pipeline consists at least of the following processing modules: tokenizer, gazetteer, sentence splitter, part-of-speech tagger, recognizer for named entities, parser, and a module for semantic role labeling. The tokenizer determines individual tokens of the text, i.e. single words, numbers, abbreviations, and punctuation marks. The gazetteer then compares the tokens to elements of several lists which contain names of various types. There usually are at least a list of person and organization names and a list of names for relevant geographic entities, e.g. countries, provinces, towns, villages, rivers and the like. Tokens matching an element in one of the lists will be annotated with the respective type, e.g., the token Kabul might be tagged as type = location, subtype = city. After dealing with tokens, the IE process looks for pieces in the text which consist of one or more tokens which belong together according to linguistic theory. First, sentences are determined by the sentence splitter. This task is less trivial than it may seem at first glance because one has to prevent the splitter from suspecting the end of a sentence after every period. Otherwise, a sentence would never make it past Mr. or Dr. or any other abbreviation of that kind. Next, the word tokens need to be annotated by their syntactic category (e.g., for the tanks move, the has to be tagged as determiner, tanks as noun and move as verb). This is the task of the part-of-speech tagger (POS tagger). GATE comes shipped with a tagger that annotates the word tokens according to the categories of the Penn-Treebank tag set [15, 17]. On the basis of the annotations provided by the gazetteer and by the POS tagger, the recognizer for named entities and the parser identify the larger pieces, the constituents and the subordinate clauses, within 2

3 the sentences. The recognizer for named entities combines elements annotated by the gazetteer. For example, for the sequence Dr. Mohammed el-baradei, the gazetteer will provide the annotations title for Dr., male forename for Mohammed and surname for el-baradei so that the recognizer can annotate the whole sequence with the tags person and noun phrase according to its rules. The parser operates on the tags provided by the POS tagger and by the recognizer for named entities. For example, in The tanks move towards Kabul, the had been labeled determiner and tanks had been labeled noun. The parser recognizes a sequence of a determiner and a noun as noun phrase. So the tanks is annotated noun phrase. In addition, towards is labeled preposition and Kabul labeled location. Together they are labeled prepositional phrase by the parser. The last module in the pipe is the module for semantic role labeling. In short, this module assigns semantic roles to the recognized constituents. In our example, the noun phrase the tanks should be labeled by the role theme, and the prepositional phrase by direction. We will come back to semantic role labeling in the next section. Obviously, the information extraction pipeline as described above can be enlarged by additional modules. For example, a spell checker can be integrated following the tokenizer in order to prevent wrong annotations resulting from spelling errors. Another module that can be added is a module for coreference resolution. Such a module should follow the parser such that syntactic information, available through the annotations provided by the parser, can be exploited for coreference resolution. At the moment, our system does not include these modules. However, the module for coreference resolution is under development. The process of information extraction has to be adapted to the domain the texts to be analyzed come from. When texts are military reports, these reports are about events that have happened or are ongoing in a specific area, or they are about persons or organizations operating in that area. Even if the reports report status information, the information is about persons, organizations, facilities and more, all located in that area. Thus, the gazetteer has to be tailored for that area. Its geographical lists have to include the names of villages and town of that area, the names of its rivers and so on. 3

4 SEMANTIC ROLE LABELING After the aforementioned steps in the data processing pipeline have been completed, we next need to determine the actions, events and situations reported in the text and assign semantic roles to the constituents the sentences of the text consist of. The process of assigning semantic roles to constituents is called Semantic Role Labeling. Sometimes the term thematic role is used for semantic role (e.g., by Sowa [18]) so that Thematic Role Labeling also denotes that process. The process of semantic role labeling links word meanings to sentence meaning. For example, the sentence Legio IX Hispania Eburaci castra posuit consists of the verb posuit and the constituents Legio IX Hispania, Eburaci, and castra. Semantic role labeling assigns agent to Legio IX Hispania, location to Eburaci, and theme to castra. Taking these assignment together with the words semantics ( Legion IX Hispania refers to the 9 th Legion of the Roman Army which had earned the surname Hispania [5]; Eburacum refers to the ancient city which is now York, castra denotes a military camp, and posuit is the Latin verb (past tense, third person singular, active, indicative) for to set up ) it becomes clear that the sentences means the 9 th Spanish Legion set up a camp in York. In order to build a module for semantic role labeling one first has to choose a set of semantic roles. Different such sets have been discussed in the linguistic literature. Our system mainly relies on the work of Sowa [18] although we have added a few roles. For example, Sowa proposes the roles location, origin, destination, and path as spatial roles to which we have added direction. In general, roles can be assigned to constituents in a process that exploits syntactic, lexical, and semantic information. With respect to English, syntactic information here means word order information. For example, in a sentence of active voice, the subject constituent, that constituent which precedes the verb, either receives the role agent or the role effector. Lexical information refers to information provided mainly by verbs and prepositions. For example, a prepositional phrase that starts with the preposition at normally open a constituent that will be labeled location as in at the marketplace or point in time as in at noon. Semantic information normally provides constraints that can be used to decide among alternatives. For example, the role agent means an active animate entity that voluntarily initiates an action [18, p. 508] whereas effector means an active determinant source, either animate or inanimate, that initiates an action, but without 4

5 voluntary intention [ibid., p. 509]. Thus, if semantic information tells us that a constituent that takes the subject position denotes an inanimate object, the role to be assigned to that constituent has to be effector and cannot be agent. In our system, the process of semantic role labeling starts by identifying the verb group within a sentence. The reason for this is quite obvious. Our reports are about actions and events, and the expression of actions and events is the domain of the verbal vocabulary, i.e. of verbs and to some degree also of deverbal nouns (nouns derived from verbs; e.g., by detonation in the IED s detonation the event of an IED detonating is referred to). By identifying the main verb from the verb group (or the verb that is the base of a deverbal noun) we got what linguists call that head of a sentence. As a next step, our system uses the lexical information of that head to identify the set of roles that are compatible with or even demanded by it. For example, a verb that denotes an action that involves a movement like advance is compatible with the spatial semantic roles origin, path and destination or direction (e.g., The company advanced from Wilderness Church via Dowdall s Tavern towards Chancellorsville ). In contrast, a verb that denotes an action that does not involve movement is compatible with the spatial role location only (e.g., The army stayed at Stafford Heights ). In the next section, subsections Preparing Semantic Role Labeling and Ontology, we will take a more detailed look at this step. After the set of compatible and mandatory roles has been identified, the system calculates for each constituent which kind of role it might take. For this step syntactic information as well as lexical information from prepositions is used. Finally, the calculated possible roles for the constituents are matched against the set of compatible and mandatory roles so that each of the constituents receives an appropriate role. For this step, the constraints provided by semantic information are exploited. IMPROVEMENTS In this section, we will discuss changes we made to the standard information extraction process and the following process of semantic role labeling. Our main motivation for those changes had been the need for speed. The resulting system for automatic text analysis is supposed to be integrated into a C2 system. Therefore, incoming reports have to be processed in real time. Thus, the changes we made improve the speed of the 5

6 process. In the following, we discuss topics related to changes: the revision of GATE s ANNIE, the use of a chunker instead of a parser, the addition of a module to prepare semantic role labeling, and, with respect to semantic role labeling itself, the integration of a specific ontology that provides lexical and semantic knowledge. MIETER In order to extract information from military reports, a high rate of correctly identified constituents and structures is crucial as well as a reasonable processing speed. The open source tool GATE already comes with an IE toolbox called ANNIE (A Nearly-New Information Extraction system) which consists of comprehensive rules and linguistic resources such as bulky gazetteer lists and an extensive full-form lexicon based on different corpora [3]. The richness of these rules and resources, however, comes along with several problems with respect to our purpose: The amount of time needed for computing increases with the number of rules. The resources and grammars contain insufficient military-specific definitions. These problems result in a lower rate of detected and extracted information and in a processing speed that does not match our needs. In order to avoid the problems, we developed our own version of an information extraction system, the so-called MIETER, which is based on ANNIE. MIETER stands for Military Information Extraction from Texts and its Electronic Representation. It is constructed in a way such that it uses smaller and more specific linguistic resources (e.g. the lexicon), it does not look for obsolete information (such as names of business companies), and it uses grammars adapted to the specific structures found in military reports. The use of less and adapted rules and resources leads to a significantly higher rate of detected and extracted information out of military reports and to a higher processing speed. 6

7 In order to keep the recognition rates high, MIETER includes additional features which are able to correct false markings of previous components of the pipeline. The POS tagger often makes false assumptions regarding the category of a word. Its decision for a tag is based on a full-form lexicon and a set of template rules both derived from corpus work. For example, the word prevailing would be recognized as the verb prevail in continuous form, but in the phrase under the prevailing conditions, it takes the function of an adjective, hence needs to be recognized as such. The MIETER Re-Tagger component detects these tagging errors and corrects them. CHUNKING Parsers normally calculate complete parse trees which represent the syntactic structure of a sentence. Parse trees contain the syntactic information we need for semantic role labeling. However, the complete and sometimes rather complex tree is normally not needed. It is often sufficient to know the verb group and the other constituents of a sentence as well as their sequence. Thus, a deep syntactic analysis is not necessary; partial parsing does the job as well. The kind of partial parsing we applied is called chunking. It is the process of identifying and classifying the (consecutive, nonoverlapping) constituents within sentences by statistical or rule-based heuristics. MIETER includes a chunker that operates with rules. The major advantages of chunking are its robustness with respect to unseen words and possible ambiguities, its ability to provide at least partial results even if a full analysis is not feasible, and its speed. Deep syntactic analysis, on the other hand, requires that for each sentence the entire syntactic structure has to be calculated. It produces much more information than chunking (more than we need for our purpose) but it might fail because of unknown words and ambiguity. Additionally, deep syntactic analysis is on the one hand highly time-consuming and on the other hand computationally very resource-intensive. Nevertheless, within the context of the work on Hecking s ZENON system [9], an approach is being developed and implemented that uses a deep parser to calculate syntactic structures of report sentences and use these structures to assign semantic annotations. See [16] for details on the deep approach and [11] for a more detailed discussion on the pros and cons of both approaches. 7

8 PREPARING SEMANTIC ROLE LABELING In order to prepare semantic role labeling, MIETER incorporates a module called General Identifier. This module annotates the constituents identified by MIETER s chunker with preliminary semantic roles such as agent, affected, completion, time and location that then will be refined by MIETER itself (cf. the following paragraph) and by the process of semantic role labeling (cf. the next section). Agent and Affected annotations can in most cases easily be calculated out of the position of a noun phrase, preceding or following the verb, and the voice of the verb group (active or passive). In contrast, syntactic and lexical information together often are needed, but also are sufficient to decide whether there is temporal or spatial information in a sentence. The preposition under might at first glance be judged to be a location marker, as in under the bridge. But prepositional phrases with the preposition under can also bear a completion, under heavy bombardment, or temporal information, under 9.58 seconds. The General Identifier uses gazetteer lists to annotate whether a prepositional phrase most probably contains temporal or spatial information. These lists include trigger words, e.g. morning or Thursday for temporal information, or Oslo or Helmand for spatial information. After classifying constituents as temporal, spatial, or completion, MIETER refines that classification by dividing the temporal constituents into start time constituents (a trigger word would e.g. be from ), end time constituent ( until ) or point in time constituent ( on ). In the same way, MIETER sub-classifies spatial constituents as location ( at ), direction ( towards ), origin ( from ), destination ( to ) or path ( via ). Figure 1 shows an example of the labeling that is already achieved by the information extraction process. Figure 1: The figure shows the preliminary labeling as calculated by MIETER. 8

9 It can be said that the General Identifier already does the main share of semantic role labeling, those parts of the work that can be done by exploiting syntactic information and simple lexical information only. However, some more processing has to be done taking more complex lexical information and semantic information in account. This means that the labels the General Identifier has assigned to the constituents have to be refined. For example, constituents that have been preliminary labeled agent will be examined once more to check whether these constituents are indeed to be labeled agent or whether the have to be relabeled, e.g. as effector or as experiencer. Similarly, the constituents that have been labeled affected will be relabeled, e.g., as patient, beneficiary, or theme. ONTOLOGY The final process of semantic role labeling exploits lexical information and semantic information. In our system, the respective knowledge is stored in an ontology (for general information about ontologies cf. [19]). The specific process is as follows: at the end of the information extraction process, for each sentence of the report its verb group is identified and its constituents are marked. For semantic role labeling, the main verb is identified in the verb group to be looked up in our ontology. This ontology is focused on verbs. It provides information about the verbs in general and about their semantic frames in particular. Semantic frames are a further development of Fillmore s case grammar [7], that tells us which semantic roles come with the verb and which of these roles are mandatory, optional or forbidden. For each verb looked up, its frame is taken, and then the slots of that frame are filled with the constituents of the sentence the verb in question came from. In order to map the constituents to the correct slot, the semantic information is exploited. This semantic information consists, for example, of restrictions that refer to the slots and that are also represented in the ontology. We created the verb ontology based on FrameNet [1, 14] and VerbNet [4]. Its construction was also influenced by the work of Helbig [10], Levin [13], and Sowa [18]. Although most existing ontologies concentrate on the objects in the domain of interest, a verb ontology has to focus on situations in general and on actions in particular. As can be seen in figure 2, left panel, the verbs in the ontology are classified into those that refer to static situations and those that refer to dynamic situations. The latter is further divided into those verbs that refer to actions and those that refer to events. Action verbs 9

10 demand an agent ( an active animate entity that voluntarily initiates an action [18, p. 508]). The action verbs are divided into many classes, among them cognition verbs (e.g., consider ), exchange verbs (e.g., receive ) and motion verbs (e.g., advance ). Actions belonging to the same class share the semantic roles they demand and allow. For example, motion verbs like advance inherit their frame of semantic roles for the Motion class, cf. figure 2, bottom right panel. Figure 2: This snippet from a Protégé screen shows the semantic properties of the verb advance. The task to make the ontology available as resource for the process of semantic role labeling is assigned to the so-called Frame Slot Creator. This module takes the main verb out of the verb group of each sentence and sends it to an Ontology Web-Service. That service acts as an interface to the ontology. It returns the semantic frame for each verb requested. The Frame Slot Creator then stores the frame in the verb group annotation as a matrix of attribute-value pairs in which the semantic roles of the frame serve as attributes. The values of these attributes have to be the constituents of the respective sentence. To fill the constituents into the appropriate slots is done by a module called Frame Slot Filler that calculates this mapping out of the constituents tags, especially its preliminary semantic role tags, and the semantic constraints the ontology provides for appropriate fillers. 10

11 CONCLUSION In the sections above, we presented a process by which texts in general and military reports in particular can be analyzed. As a result of that process the text in question is annotated. That annotation assigns a semantic role to most of the text s constituents. Since the text in question is written in natural language, the anomalies which are characteristic of natural language like ambiguity, vagueness, and creative usage provoke some gaps as well as some errors in the assignment. Nevertheless, as long as the texts refer to a restricted domain, as military reports normally do ( attack in these reports means a specific military action and not a situation on the soccer field or on the racetrack and so on), the assignment is nearly complete and correct. The question remains how an assignment of semantic roles to text constituents might help for text analysis and why this means advancement beyond keyword spotting. In order to illustrate that advancement let us take a look at an application of written text analysis, namely the automatic pre-processing for threat recognition. Through this preprocessing those written reports in a stream of reports can be identified that may indicate threats. Following the automatic pre-process, a human expert receives the selected reports to check them for threat indicators. Obviously, keyword spotting can help in the automatic pre-processing. For example, if person X is known as member of an organization Y that carries out terrorist attacks, spotting X or Y in text A should result in adding A to the list of texts to be given to the human expert. However, keyword spotting results in many false hits. For example, if a report says that X died in a car accident the day before, the report might be of importance for updating X s member list but does not indicate a threat. In contrast, if a report tells us that X bought lots of items that are used in IED construction that report indicates threat. The example tells us that keywords can be used as indicators for threats but that they are not very precise indicators. However, under the assumption that the reports in question are represented with the semantic role annotations we discussed above, indicators can be constructed that are significantly more precise. This is even truer if we use our ontology to do some simple reasoning in addition. For example, instead of searching for X or Y, we can construct an indicator which says check for X as agent of a procure action in which the theme is an item needed for IED construction. The annotation provides information as to which constituents are agent or theme, respectively, and the ontology provides the knowledge regarding which verbs can describe a procure action 11

12 (like buy or steal) and the knowledge which objects are part of an IED. The former knowledge is represented in the ontology s action branch and the latter in its object branch. In sum, annotating text with semantic roles allows for focused checks and thus provides a fine basis for applications like automatic threat pre-processing. The technologies described above have been developed for and are integrated in the AuGE system, a demonstrator for automatic threat recognition build by IABG in a project sponsored by German Air Force s Transformation Center. In that project, Fraunhofer FKIE served as subcontractor to IABG. REFERENCES [1] FrameNet. [2] Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Valentin Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, [3] GATE s ANNIE: [4] VerbNet: A Class-Based Verb Lexicon. [5] [6] Lukas Sikorski, Bastian Haarmann, Ulrich Schade. Computational Linguistics Tools Exploited for Automatic Threat Recognition. To be published. Proceedings of the NATO RTE IST-099. Madrid, [7] Charles J. Fillmore. The case for case. In Emmon Bach and Robert T. Harms, editors, Universals in Linguistic Theory. Holt, Rinehart and Winston, New York, [8] Matthias Hecking. Information Extraction from Battlefield Reports. In Proceedings of the 8 th International Command and Control Research and Technology Symposium (ICCRTS), Washington, DC, [9] Matthias Hecking. System ZENON. Semantic Analysis of Intelligence Reports. In Proceedings of the LangTech 2008, Rome, Italy, [10] Hermann Helbig. Knowledge Representation and the Semantics of Natural Language. Springer, Berlin, [11] Constantin Jenge, Silverius Kawaletz and Ulrich Schade. Combining Different NLP Methods for HUMINT Report Analysis. NATO RTO IST Panel Symposium. Stockholm, Sweden, October

13 [12] Bastian Haarmann, Lukas Sikorski. Applied Text Mining for Military Intelligence Necessities. To be published. Proceedings of the Future Security Conference. Berlin, [13] Beth Levin. English Verb Classes And Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, IL, [14] Birte Lönneker-Rodman and Collin F. Baker. The FrameNet Model and its Applications. Natural Language Engineering, 15, , [15] Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19, , [16] Bastian Haarmann: Semantic Role Labeling im modernen Text-Analyse-Prozess. To be published. Know Tech Conference Bad Homburg v.d.h., [17] Martha Palmer, Dan Gildea, and Paul Kingsbury. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31, , [18] John F. Sowa. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks/Cole, [19] Steffen Staab and Rudi Studer, editors. Handbook on Ontologies. International Handbooks on Information Systems. Springer, [20] Jake Thackray. The holy grail. In David Potts, editor. The Big Issue: Command and Combat in the Information Age. CCRP, Washington, DC,

How to make Ontologies self-building from Wiki-Texts

How to make Ontologies self-building from Wiki-Texts How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Context Grammar and POS Tagging

Context Grammar and POS Tagging Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 [email protected] [email protected]

More information

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning [email protected]

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning [email protected] Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Parsing Software Requirements with an Ontology-based Semantic Role Labeler Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software

More information

Text Generation for Abstractive Summarization

Text Generation for Abstractive Summarization Text Generation for Abstractive Summarization Pierre-Etienne Genest, Guy Lapalme RALI-DIRO Université de Montréal P.O. Box 6128, Succ. Centre-Ville Montréal, Québec Canada, H3C 3J7 {genestpe,lapalme}@iro.umontreal.ca

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA [email protected], [email protected]

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

Brill s rule-based PoS tagger

Brill s rule-based PoS tagger Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

PoS-tagging Italian texts with CORISTagger

PoS-tagging Italian texts with CORISTagger PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance

More information

Bilingual Dialogs with a Network Operating System

Bilingual Dialogs with a Network Operating System From:MAICS-99 Proceedings. Copyright 1999, AAAI (www.aaai.org). All rights reserved. Bilingual Dialogs with a Network Operating System Emad Al-Shawakfa, Computer Science Department, Illinois Institute

More information

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words Standard 3: Writing Process 3.1: Prewrite 58-69% 10.LA.3.1.2 Generate a main idea or thesis appropriate to a type of writing. (753.02.b) Items may include a specified purpose, audience, and writing outline.

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Introduction to Text Mining. Module 2: Information Extraction in GATE

Introduction to Text Mining. Module 2: Information Extraction in GATE Introduction to Text Mining Module 2: Information Extraction in GATE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and

More information

LANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5

LANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5 Page 1 of 57 Grade 3 Reading Literary Text Principles of Reading (P) Standard 1: Demonstrate understanding of the organization and basic features of print. Standard 2: Demonstrate understanding of spoken

More information

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,

More information

Named Entity Recognition Experiments on Turkish Texts

Named Entity Recognition Experiments on Turkish Texts Named Entity Recognition Experiments on Dilek Küçük 1 and Adnan Yazıcı 2 1 TÜBİTAK - Uzay Institute, Ankara - Turkey [email protected] 2 Dept. of Computer Engineering, METU, Ankara - Turkey

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database I.J. Intelligent Systems and Applications, 2013, 12, 81-90 Published Online November 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.12.07 Semantic Analysis of Natural Language Queries

More information

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun [email protected] Mohamed Salah Gouider [email protected] Lamjed Ben Said [email protected] ABSTRACT

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like

More information

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system

More information

Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction

Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction Minjun Park Dept. of Chinese Language and Literature Peking University Beijing, 100871, China [email protected] Yulin Yuan

More information

Computational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics

Computational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics Computational Linguistics and Learning from Big Data Gabriel Doyle UCSD Linguistics From not enough data to too much Finding people: 90s, 700 datapoints, 7 years People finding you: 00s, 30000 datapoints,

More information

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing 1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: [email protected])

More information

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) James Clarke, Vivek Srikumar, Mark Sammons, Dan Roth Department of Computer Science, University of Illinois, Urbana-Champaign.

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS [email protected] Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

Genre distinctions and discourse modes: Text types differ in their situation type distributions

Genre distinctions and discourse modes: Text types differ in their situation type distributions Genre distinctions and discourse modes: Text types differ in their situation type distributions Alexis Palmer and Annemarie Friedrich Department of Computational Linguistics Saarland University, Saarbrücken,

More information

SVM Based Learning System For Information Extraction

SVM Based Learning System For Information Extraction SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk

More information

Click to edit Master title style

Click to edit Master title style Click to edit Master title style UNCLASSIFIED//FOR OFFICIAL USE ONLY Dr. Russell D. Richardson, G2/INSCOM Science Advisor UNCLASSIFIED//FOR OFFICIAL USE ONLY 1 UNCLASSIFIED Semantic Enrichment of the Data

More information

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING

More information

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

More information

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Ana-Maria Popescu Alex Armanasu Oren Etzioni University of Washington David Ko {amp, alexarm, etzioni,

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged

More information

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate

More information

Outline of today s lecture

Outline of today s lecture Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?

More information

Frames and Commonsense. Winston, Chapter 10

Frames and Commonsense. Winston, Chapter 10 Frames and Commonsense Winston, Chapter 10 Michael Eisenberg and Gerhard Fischer TA: Ann Eisenberg AI Course, Fall 1997 Eisenberg/Fischer 1 Representations powerful ideas the representation principle:

More information

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872

More information

PTE Academic Preparation Course Outline

PTE Academic Preparation Course Outline PTE Academic Preparation Course Outline August 2011 V2 Pearson Education Ltd 2011. No part of this publication may be reproduced without the prior permission of Pearson Education Ltd. Introduction The

More information

A Method for Automatic De-identification of Medical Records

A Method for Automatic De-identification of Medical Records A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA [email protected] Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA [email protected] Abstract

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

A Case Study of Question Answering in Automatic Tourism Service Packaging

A Case Study of Question Answering in Automatic Tourism Service Packaging BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0045 A Case Study of Question

More information

Automatic Pronominal Anaphora Resolution in English Texts

Automatic Pronominal Anaphora Resolution in English Texts Computational Linguistics and Chinese Language Processing Vol. 9, No.1, February 2004, pp. 21-40 21 The Association for Computational Linguistics and Chinese Language Processing Automatic Pronominal Anaphora

More information

Automatic Pronominal Anaphora Resolution. in English Texts

Automatic Pronominal Anaphora Resolution. in English Texts Automatic Pronominal Anaphora Resolution in English Texts Tyne Liang and Dian-Song Wu Department of Computer and Information Science National Chiao Tung University Hsinchu, Taiwan Email: [email protected];

More information

IAI : Knowledge Representation

IAI : Knowledge Representation IAI : Knowledge Representation John A. Bullinaria, 2005 1. What is Knowledge? 2. What is a Knowledge Representation? 3. Requirements of a Knowledge Representation 4. Practical Aspects of Good Representations

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

Endowing a virtual assistant with intelligence: a multi-paradigm approach

Endowing a virtual assistant with intelligence: a multi-paradigm approach Endowing a virtual assistant with intelligence: a multi-paradigm approach Josefa Z. Hernández, Ana García Serrano Department of Artificial Intelligence Technical University of Madrid (UPM), Spain {phernan,agarcia}@dia.fi.upm.es

More information

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ [email protected]

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ [email protected] 1 Statistical Parsing: the company s clinical trials of both its animal and human-based

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language

More information

A Knowledge-based System for Translating FOL Formulas into NL Sentences

A Knowledge-based System for Translating FOL Formulas into NL Sentences A Knowledge-based System for Translating FOL Formulas into NL Sentences Aikaterini Mpagouli, Ioannis Hatzilygeroudis University of Patras, School of Engineering Department of Computer Engineering & Informatics,

More information

A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: [email protected] Abstract Knowledge

More information

Natural Language Processing

Natural Language Processing Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models

More information

ANALEC: a New Tool for the Dynamic Annotation of Textual Data

ANALEC: a New Tool for the Dynamic Annotation of Textual Data ANALEC: a New Tool for the Dynamic Annotation of Textual Data Frédéric Landragin, Thierry Poibeau and Bernard Victorri LATTICE-CNRS École Normale Supérieure & Université Paris 3-Sorbonne Nouvelle 1 rue

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

FUNDAMENTALS OF ARTIFICIAL INTELLIGENCE KNOWLEDGE REPRESENTATION AND NETWORKED SCHEMES

FUNDAMENTALS OF ARTIFICIAL INTELLIGENCE KNOWLEDGE REPRESENTATION AND NETWORKED SCHEMES Riga Technical University Faculty of Computer Science and Information Technology Department of Systems Theory and Design FUNDAMENTALS OF ARTIFICIAL INTELLIGENCE Lecture 7 KNOWLEDGE REPRESENTATION AND NETWORKED

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations

More information

Paraphrasing controlled English texts

Paraphrasing controlled English texts Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining

More information

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,

More information

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

More information