Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature

Size: px
Start display at page:

Download "Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature"

From this document you will learn the answers to the following questions:

  • What is the main focus of this paper?

  • How many pages is the book Syntactic Parsing?

  • What is a keyword for semantic properties?

Transcription

1 Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature Sérgio Matos 1, Anabela Barreiro 2, and José Luis Oliveira 1 1 IEETA, Universidade de Aveiro, Campus Universitário de Santiago, Aveiro, Portugal 2 Faculdade de Letras, Universidade do Porto, Via Panorâmica, Porto, Portugal {aleixomatos,jlo}@ua.pt, barreiro_anabela@hotmail.com Abstract. Rapid advances in science and in laboratorial and computing methods are generating vast amounts of data and scientific literature. In order to keep up-to-date with the expanding knowledge in their field of study, researchers are facing an increasing need for tools that help manage this information. In the genomics field, various databases have been created to save information in a formalized and easily accessible form. However, human curators are not capable of updating these databases at the same rate new studies are published. Advanced and robust text mining tools that automatically extract newly published information from scientific articles are required. This paper presents a methodology, based on syntactic parsing, for identification of gene events from the scientific literature. Evaluation of the proposed approach, based on the BioNLP shared task on event extraction, produced an average F-score of 47.1, for six event types. Keywords: Biomedical literature, information extraction, bio-molecular events, syntactic parsing, semantic properties. 1 Introduction Recent advances in biotechnology, namely the widespread use of high-throughput methods for gene analysis, have originated vast amounts of published scientific literature. While much of the data and results described in these studies are being annotated in the various existing biomedical databases, these are not easily kept up-to-date. As a result, many relevant research outcomes are still enclosed as free-text in the scientific literature, which remains the major source of information for researchers [1]. It is therefore increasingly difficult for researchers to keep track of the quickly expanding biomedical knowledge to support their experiment planning and analysis of results [2][3]. Researchers are currently faced with issues such as (i) how to identify the most relevant articles for their specific study, (ii) how to identify the mentioned concepts (genes, proteins, diseases and so on) and relations between them, and (iii) how to integrate the extracted information with the existing knowledge in a simple, efficient, and userfriendly manner [2][4]. This integrated view of information extracted from literature, in the framework of more systematized and formalized knowledge annotated in databases and ontologies, is an important requisite for biological data analysis [3]. L. Seabra Lopes et al. (Eds.): EPIA 2009, LNAI 5816, pp , Springer-Verlag Berlin Heidelberg 2009

2 80 S. Matos, A. Barreiro, and J.L. Oliveira To address these issues, several tools have been developed in the past years that combine Information Extraction (IE), Text Mining (TM) and Natural Language Processing (NLP) techniques with the domain knowledge available in resources such as the Entrez Gene, UniProt, GO or UMLS [1][2][4][5]. Such tools process text titles and abstracts from the MEDLINE/PubMed [6] literature database and present the extracted information in different forms. The ihop tool [7] identifies genes and proteins in PubMed abstracts and uses them as links, allowing the navigation through sentences and abstracts. The AliBaba system [8] is based on pattern matching and cooccurrence statistics to find associations between biological entities such as genes, proteins or diseases, and presents the search results in the form of a graph. EBIMed [9] also finds associations between protein/gene names, GO annotations, drugs and species in PubMed abstracts resulting from a user query. The results are displayed in a table with links to the sentences and abstracts that support the corresponding associations. A similar tool, FACTA [10] retrieves abstracts from PubMed and identifies biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) co-occurring with the user query term. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics or on pointwise mutual information. More recently, there has been some focus on applying more detailed linguistic processing in order to improve information retrieval and extraction. Chilibot [11] retrieves sentences from PubMed abstracts related to a pair or a list of proteins, genes, or keywords, and applies shallow parsing to classify these sentences as interactive, non-interactive or simple abstract co-occurrence. The identified relationships between entities or keywords are then displayed as a graph. MEDIE [12] uses a deep-parser and a term recognizer to index abstracts based on pre-computed semantic annotations, allowing for real-time retrieval of sentences containing biological concepts associated with the terms specified in the user query. Interest in the application of more advanced methods of linguistic processing is also evident in the recent information extraction evaluation challenges, namely the BioNLP shared task on event extraction [13] and the BioCreAtIvE II.5 challenge [14], which investigate the extraction of gene events from literature. In this paper, we describe a methodology based on syntactic parsing to detect and annotate bio-molecular events, such as protein production and breakdown, localization or binding events. We present results from our participation in the BioNLP shared task and discuss the main difficulties and further developments required in this area. 2 Methods The method described in this paper to identify bio-molecular events is based on syntactic grammars that process texts and detect the occurrence of linguistic patterns that describe such events. Syntactic parsing was implemented using NooJ [15], a freely available development environment and linguistic processing engine that includes tools for inflectional and derivational morphology, syntactic grammars and semantics. NooJ uses dictionaries and grammars to produce formalized descriptions of natural language and contains a system of inflectional and derivational paradigms, which interacts with the dictionary. Inflectional rules apply to a dictionary entry (lemma) to recognize and generate inflected forms, including gender, number and tense. Derivational

3 Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature 81 rules apply to a dictionary entry to recognize and generate derived forms, such as nominalizations (predicate nouns morphosyntactically related to a verb) as adopted in [16]. Lemmas can also have semantic information included. Semantic properties allow, for example, adding the characteristic of a particular named entity, such as ORGANISM, PROTEIN or DISEASE. These properties are illustrated in Table 1. Table 1. Dictionary entries in NooJ Lemma PoS FLX Semantic properties ID TAXID human N TABLE ORGANISM 9606 Homo sapiens N ORGANISM 9606 Breast cancer type 1 N PROTEIN P susceptibility protein BRCA1 N PROTEIN P BRCA1 N PROTEIN P BRCA1 N GENE RNF53 N GENE To create the dictionaries used in this method, we adapted the verb dictionary from the biomedical resource BioLexicon [17][18]. BioLexicon includes verbs that occur frequently in the biomedical literature and that usually describe a specific event, such as express, bind and transcribe. We enhanced the BioLexicon dictionary with inflectional ( FLX ) and derivational ( DRV ) attributes and with semantic properties, as shown in Table 2. For example, ION:TABLE represents the derivational and inflectional paradigms for the nominalization expression (which inflects as the word TA- BLE), and ABOLISH represents the inflectional paradigm for the verb express. The semantic properties in NooJ dictionaries were used to assign specific event types to the verbs in the literature that describe those events. In Table 2, the verb stimulate, for example, is assigned a semantic property EventType with a value Positive_Regulation. This semantic property is then used in the syntactic grammars, which add an annotation to that type of event whenever it is detected in texts. Table 2. Definition of verbs in the dictionary Lemma PoS DRV FLX EventType express V ION:TABLE ABOLISH Gene_expression ligate V TION:TABLE SMILE Binding stimulate V TION:TABLE SMILE Positive_regulation The inflectional and derivational paradigms are described in terms of re-write rules. For example, the noun inflectional paradigm TABLE, defines that the plural of the dictionary word associated with this rule is formed by adding an s to the lemma. Hence, the plural of any word associated with the attribute +FLX=TABLE (ex. human ) will be obtained in the same way. In the case of verbs, inflectional rules describe the conjugation of the verb. For example, the inflectional paradigm SMILE defines re-write rules in terms of person, number and tense for verbs that

4 82 S. Matos, A. Barreiro, and J.L. Oliveira conjugate like the verb to smile. Similarly, the derivational system allows the derivation of a word, as defined by the derivational rule. This allows, for example, obtaining nouns and adjectives from verb entries. The derived word maintains the semantic properties of the word from which is derived (lemma). Thus, the predicate noun stimulation is produced and linked to a positive regulation event, through its inherited semantic properties from the verb stimulate. In order to define the type of events linked to each verb, we used the training data in the BioNLP shared task. Based on the manual linguistic annotations, we extracted the sentences corresponding to each event, and assigned the event type to the verbs found on those sentences. We then manually checked this list and selected only those verbs showing a specific link to a type of event. In case verbs were linked to more than one event type, only the most frequent event type was selected, and the remaining ones removed. In NooJ, syntactic grammars can be used to process sequences of tokens to recognize and annotate multiword expressions. In the approach used, our aim was to detect linguistic patterns, based on named entities (genes and proteins) and on biologically relevant verbs and verb nominalizations referencing some type of bio-molecular event. These entities, verbs and nouns are automatically annotated by NooJ when the dictionaries and grammars are applied to texts. In order to create the relevant grammars, we first used NooJ to extract general concordances from the texts that included an annotated gene or protein and a verb or nominalization. We then identified, in the examples provided by the concordances, specific grammatical constructions describing different types of events. For example, we were able to identify a simple pattern composed of a nominalization, the particle of and a named gene or protein, as in expression of p53 or stimulation of CD4. These patterns were described in terms of syntactic grammars, as illustrated in Fig. 1. The output of the grammar (shown below the connecting lines) identifies the protein ( CD4 ), the expression referencing the event ( stimulation ) and the type of event. Construction and refinement of the syntactic grammars is an iterative process. After creating a baseline grammar to describe a particular construction, we try to incorporate syntactic-semantic variants (paraphrases) in order to achieve better recall, without compromising precision. For example, the grammar used to identify the construction expression of p53 should also be able to identify expression of gene p53 or expression of the human gene p53. The training and development data sets of the shared task were used during this iterative process. The semantic properties included in the dictionary are used in the syntactic grammars to specify the event type in the annotation. Example 1 shows the output of the grammar in Fig. 1: CD4 is the named entity and stimulation is the expression identifying the bio-molecular event. The event type, positive regulation, is obtained directly from the expression s semantic properties. Example 1. Grammar output used to annotate the expression in texts Stimulation of human CD4 <EVENT+PROTEIN=CD4+EXP=Stimulation+TYPE=Positive_regula tion>

5 Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature 83 Fig. 1. Grammar to detect phrases, such as stimulation of CD4 3 Results The application of the grammars described in the previous section allowed the extraction of phrases that reference gene related events. Table 3 shows some examples of the patterns described by these grammars and the corresponding concordances found in texts. Although these are relatively simple patterns, they can model a large portion of the language used to present such events. Table 3. Patterns detected by the grammars Pattern Concordance in text <entity> [<entity_type>] <nominalization> HSP gene expression <nominalization> of [<entity_type>] <entity> upregulation of Fas <entity> [<entity_type>] <be> [ not ] [<adverb>] <verb> IL-2R stimulation was totally inhibited <verb> <preposition> <entity> binding of TRAF2 <verb> <nominalization> of <entity> suppressing activation of STAT6 This section presents the evaluation results of the proposed method, obtained using the test data from the BioNLP shared task on event extraction. This data set was not used for defining the semantic properties to include in the dictionary or for creating the syntactic grammars. The aim of the shared task was to detect gene events in Pub- Med abstracts and create the corresponding annotations, including the protein(s) involved, the referencing expression or trigger and the type of event. The data for the BioNLP task was derived from the GENIA event corpus and comprised 800 abstracts in the training set, 150 in the development set, and 260 in the test set. Details on the annotation procedure and evaluation metrics are described in [13]. The BioNLP shared task divided events into nine types. The regulatory events were not included in this study due to time constraints and to the more complex structure of those events. Results for the remaining six event types are displayed in Table 4. These results were achieved using six grammars similar to the one exemplified in Fig. 1. An average F-score of was obtained. Except for binding events, the results are promising and show that a good performance can be obtained using this simple approach. In

6 84 S. Matos, A. Barreiro, and J.L. Oliveira Table 4. Performance of the event detection method (test data) Event type Recall Precision F-score Localization Binding Gene Expression Transcription Protein Catabolism Phosphorylation Average the case of binding events, the participation of two proteins creates extra difficulty in describing such events, and the results are still poor. 4 Discussion We have described an approach which uses syntactic grammars to detect and annotate gene events from the scientific literature. The proposed method takes advantage of the inflectional and derivational morphology and the semantic properties established in dictionaries and grammars developed with NooJ, which allow to associate terminological verbs and their derivations to specific event types. This approach provides a general and flexible solution for information extraction from biomedical texts. The results illustrated in Table 4 indicate that this approach can be used to process the literature and extract networks of events and interactions. These networks are valuable for literature search and navigation, as proposed in MEDIE or Chilibot tools, but require much less processing. However, some shortcomings need to be considered and improved. The first limitation is related to named entity recognition. In the BioNLP shared task, participants were supplied with the names and positions in text of mentioned genes and proteins. In such a setup, recognizing linguistic patterns where these entities occur is significantly simplified. In a more realistic task, the processing pipeline would not have the list of mentioned entities as an input and a named entity recognizer with a very good performance needs to be included in the processing steps. Another limitation concerns the identification of patterns and creation of grammars. Although a manual procedure such as the one taken can identify the most salient linguistic patterns, it would be interesting to investigate the possibility to generate and assess new patterns automatically. In this study, we have not included the gene regulatory events because these are frequently referenced by more complex constructions which are not yet covered by our grammars. Describing and extracting these events is of great importance and will become a future direction of our work. Finally, it is important to assess the advantages and disadvantages of the proposed approach for identifying relations and events, when compared to other methods based on shallow or deep parsing. Methods such as the one proposed in this paper can be used to help database curators identify the most relevant facts in the literature and speed-up the annotation process. Tools based on these methods can also provide alternative querying and browsing of facts cited in the literature and be useful for researchers. However, before these

7 Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature 85 methods can be truly useful, they must be included in user-oriented tools that offer robust and reliable performance while hiding the complexity of the linguistic processing. It is also of major importance that these tools keep links to the reference databases so that users can navigate from the literature to these resources and back, in a simple and fluid way. References 1. Rebholz-Schuhmann, D., Kirsch, H., Couto, F.: Facts from text: is text mining ready to deliver? PLoS Biol. 3, e65 (2005) 2. Altman, R.B., Bergman, C.M., Blake, J., Blaschke, C., Cohen, A., Gannon, F., Grivell, L., Hahn, U., Hersh, W., Hirschman, L., Jensen, L.J., Krallinger, M., Mons, B., O Donoghue, S.I., Peitsch, M.C., Rebholz-Schuhmann, D., Shatkay, H., Valencia, A.: Text mining for biology - the way forward: opinions from leading scientists. Genome Biology 9(suppl. 2), S7 (2008) 3. Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7, (2006) 4. Shatkay, H.: Hairpins in bookstacks: Information retrieval from biomedical text. Briefings in Bioinformatics 6(3), (2005) 5. Weeber, M., Kors, J.A., Mons, B.: Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics 6(3), (2005) 6. PubMed, 7. Hoffmann, R., Valencia, A.: ihop - A Gene Network for Navigating the Literature. Nature Genetics 36, 664 (2004) 8. Plake, C., Schiemann, T., Pankalla, M., Hakenberg, J., Leser, U.: Ali Baba: PubMed as a graph. Bioinformatics 22(19), (2006) 9. Rebholz-Schuhmann, D., Kirsch, H., Arregui, M., Gaudan, S., Riethoven, M., Stoehr, P.: EBIMed text crunching to gather facts for proteins from Medline. Bioinformatics 23(2), (2007) 10. Tsuruoka, Y., Tsujii, J., Ananiadou, S.: FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21), (2008) 11. Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5, 147 (2008) 12. Miyao, Y., Ohta, T., Masuda, K., Tsuruoka, Y., Yoshida, K., Ninomiya, T., Tsujii, J.: Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases. In: Proceedings of COLING-ACL 2006, Sydney, pp (2006) 13. Kim, J.-D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP 2009 Shared Task on Event Extraction. In: Proceedings of Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop (2009) 14. BioCreAtIvE - Critical Assessment of Information Extraction Systems in Biology, NooJ, Barreiro, A.M.: Make it simple with paraphrases: Automated paraphrasing for authoring aids and machine translation. PhD dissertation. Faculdade de Letras da Universidade do Porto, Porto (2008) 17. Sasaki, Y., Montemagni, S., Pezik, P., Rebholz-Schuhmann, D., McNaught, J., Ananiadou, S.: BioLexicon: A Lexical Resource for the Biology Domain. In: Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (2008) 18. BOOTStrep Bio-Lexicon,

Extraction and Visualization of Protein-Protein Interactions from PubMed

Extraction and Visualization of Protein-Protein Interactions from PubMed Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much

More information

Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track

Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and

More information

CENG 734 Advanced Topics in Bioinformatics

CENG 734 Advanced Topics in Bioinformatics CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the

More information

Natural Language Processing for Bioinformatics: The Time is Ripe

Natural Language Processing for Bioinformatics: The Time is Ripe Natural Language Processing for Bioinformatics: The Time is Ripe Jeffrey T. Chang Soumya Raychaudhuri is a Ph.D. candidate in the Russ Altman lab in the Biomedical Informatics program at Stanford University.

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

Molecular event extraction from Link Grammar parse trees in the BioNLP 09 Shared Task

Molecular event extraction from Link Grammar parse trees in the BioNLP 09 Shared Task Computational Intelligence, Volume xx, Number 000, 2009 Molecular event extraction from Link Grammar parse trees in the BioNLP 09 Shared Task Võ HáNguyên, Jörg Hakenberg, Luis Tari, Chitta Baral, Arizona

More information

Abstracting the types away from a UIMA type system

Abstracting the types away from a UIMA type system Abstracting the types away from a UIMA type system Karin Verspoor, William Baumgartner Jr., Christophe Roeder, and Lawrence Hunter Center for Computational Pharmacology University of Colorado Denver School

More information

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,

More information

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

PPInterFinder A Web Server for Mining Human Protein Protein Interaction PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar

More information

Extracting value from scientific literature: the power of mining full-text articles for pathway analysis

Extracting value from scientific literature: the power of mining full-text articles for pathway analysis FOR PHARMA & LIFE SCIENCES WHITE PAPER Harnessing the Power of Content Extracting value from scientific literature: the power of mining full-text articles for pathway analysis Executive Summary Biological

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

Final Program Auction - Diagnos and Competitors

Final Program Auction - Diagnos and Competitors Final Program Second BioCreAtIvE Challenge Workshop: Critical Assessment of Information Extraction in Molecular Biology Venue: Auditorium Madrid, April, 23-25, 2007 Main Organizer Prof. Alfonso Valencia,

More information

Impact of Corpus Diversity and Complexity on NER Performance

Impact of Corpus Diversity and Complexity on NER Performance Impact of Corpus Diversity and Complexity on NER Performance Tatyana Shmanina 1,2, Ingrid Zukerman 1,2, Antonio Jimeno Yepes 1,3, Lawrence Cavedon 1,3, Karin Verspoor 1,3 1 NICTA Victoria Research Laboratory,

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and

More information

Classification and Prioritization of Biomedical Literature for the Comparative Toxicogenomics Database

Classification and Prioritization of Biomedical Literature for the Comparative Toxicogenomics Database Classification and Prioritization of Biomedical Literature for the Comparative Toxicogenomics Database Dina VISHNYAKOVA a,b,d,1, Emilie PASCHE a,b,d, Julien GOBEILL a,c,d, Arnaud GAUDINAT a,c,d, Christian

More information

Efficient Data Integration in Finding Ailment-Treatment Relation

Efficient Data Integration in Finding Ailment-Treatment Relation IJCST Vo l. 3, Is s u e 3, Ju l y - Se p t 2012 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Efficient Data Integration in Finding Ailment-Treatment Relation 1 A. Nageswara Rao, 2 G. Venu Gopal,

More information

Augmenting the Medical Subject Headings vocabulary with semantically rich variants to improve disease mention normalisation

Augmenting the Medical Subject Headings vocabulary with semantically rich variants to improve disease mention normalisation Augmenting the Medical Subject Headings vocabulary with semantically rich variants to improve disease mention normalisation Riza Batista-Navarro and Sophia Ananiadou National Centre for Text Mining, School

More information

Big Data and Text Mining

Big Data and Text Mining Big Data and Text Mining Dr. Ian Lewin Senior NLP Resource Specialist Ian.lewin@linguamatics.com www.linguamatics.com About Linguamatics Boston, USA Cambridge, UK Software Consulting Hosted content Agile,

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for

More information

Understanding Biology in the Era of Big Data:

Understanding Biology in the Era of Big Data: FOR PHARMA & LIFE SCIENCES WHITE PAPER Understanding Biology in the Era of Big Data: Depth of Coverage Matters Executive Summary Biological research today can be summarized in one word data. With more

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Integrating Annotation Tools into UIMA for Interoperability

Integrating Annotation Tools into UIMA for Interoperability Integrating Annotation Tools into UIMA for Interoperability Scott Piao, Sophia Ananiadou and John McNaught School of Computer Science & National Centre for Text Mining The University of Manchester UK {scott.piao;sophia.ananiadou;john.mcnaught}@manchester.ac.uk

More information

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,

More information

ProteinQuest user guide

ProteinQuest user guide ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for

More information

Review PubMed and beyond: a survey of web tools for searching biomedical literature

Review PubMed and beyond: a survey of web tools for searching biomedical literature Review PubMed and beyond: a survey of web tools for searching biomedical literature Zhiyong Lu* National Center for Biotechnology Information (NCBI), National Library of Medicine, Bethesda, MD 20894, USA

More information

Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies

Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1, José Carlos Cortizo 1,2, José María Gómez 3 1 Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa

More information

HPI in-memory-based database system in Task 2b of BioASQ

HPI in-memory-based database system in Task 2b of BioASQ CLEF 2014 Conference and Labs of the Evaluation Forum BioASQ workshop HPI in-memory-based database system in Task 2b of BioASQ Mariana Neves September 16th, 2014 Outline 2 Overview of participation Architecture

More information

Web-Based Genomic Information Integration with Gene Ontology

Web-Based Genomic Information Integration with Gene Ontology Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION

BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION ROBERT LEAMAN Department of Computer Science and Engineering, Arizona State University GRACIELA GONZALEZ * Department of

More information

COMPARING USABILITY OF MATCHING TECHNIQUES FOR NORMALISING BIOMEDICAL NAMED ENTITIES

COMPARING USABILITY OF MATCHING TECHNIQUES FOR NORMALISING BIOMEDICAL NAMED ENTITIES COMPARING USABILITY OF MATCHING TECHNIQUES FOR NORMALISING BIOMEDICAL NAMED ENTITIES XINGLONG WANG AND MICHAEL MATTHEWS School of Informatics, University of Edinburgh Edinburgh, EH8 9LW, UK {xwang,mmatsews}@inf.ed.ac.uk

More information

Study of Effect of Drug Lexicons on Medication Extraction from Electronic Medical Records. E. Sirohi and P. Peissig

Study of Effect of Drug Lexicons on Medication Extraction from Electronic Medical Records. E. Sirohi and P. Peissig Study of Effect of Drug Lexicons on Medication Extraction from Electronic Medical Records E. Sirohi and P. Peissig Pacific Symposium on Biocomputing 10:308-318(2005) STUDY OF EFFECT OF DRUG LEXICONS ON

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity

More information

A New Method to Retrieve, Cluster And Annotate Clinical Literature Related To Electronic Health Records

A New Method to Retrieve, Cluster And Annotate Clinical Literature Related To Electronic Health Records A New Method to Retrieve, Cluster And Annotate Clinical Literature Related To Electronic Health Records Izaskun Fernandez 1, Ana Jimenez-Castellanos 2, Xabier García de Kortazar 1, and David Perez-Rey

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

How To Rank Term And Collocation In A Newspaper

How To Rank Term And Collocation In A Newspaper You Can t Beat Frequency (Unless You Use Linguistic Knowledge) A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter Udo Hahn Jena University Language & Information

More information

Intro to Bioinformatics

Intro to Bioinformatics Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate

More information

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Prof. Dr. Martin Hofmann-Apitius Head of the Department of Bioinformatics Fraunhofer Institute for Algorithms and

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge

More information

LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task

LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task Jacinto Mata, Mariano Crespo, Manuel J. Maña Dpto. de Tecnologías de la Información. Universidad de Huelva Ctra. Huelva - Palos de la Frontera s/n.

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

Curation of NLP Pipeline - A Review

Curation of NLP Pipeline - A Review ASSISTED CURATION: DOES TEXT MINING REALLY HELP? BEATRICE ALEX, CLAIRE GROVER, BARRY HADDOW, MIJAIL KABADJOV, EWAN KLEIN, MICHAEL MATTHEWS, STUART ROEBUCK, RICHARD TOBIN, AND XINGLONG WANG School of Informatics

More information

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information

More information

BIOMEDICAL LITERATURE MINING FOR PHARMACOKINETICS NUMERICAL PARAMETER COLLECTION. Zhiping Wang

BIOMEDICAL LITERATURE MINING FOR PHARMACOKINETICS NUMERICAL PARAMETER COLLECTION. Zhiping Wang BIOMEDICAL LITERATURE MINING FOR PHARMACOKINETICS NUMERICAL PARAMETER COLLECTION Zhiping Wang Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the

More information

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS 29 OCTOBER 2015 DR. DIRK J. EVERS BACKGROUND TreatmentMAP

More information

Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática. Building Domain Specific Corpora in Portuguese Language

Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática. Building Domain Specific Corpora in Portuguese Language Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática Programa de Pós-Graduação em Ciência da Computação Building Domain Specific Corpora in Portuguese Language Lucelene Lopes,

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

How To Understand And Understand A Negative In Bbg

How To Understand And Understand A Negative In Bbg Some Aspects of Negation Processing in Electronic Health Records Svetla Boytcheva 1, Albena Strupchanska 2, Elena Paskaleva 2 and Dimitar Tcharaktchiev 3 1 Department of Information Technologies, Faculty

More information

Creating Metabolic Network Models using Text Mining and Expert Knowledge

Creating Metabolic Network Models using Text Mining and Expert Knowledge Creating Metabolic Network Models using Text Mining and Expert Knowledge J.A. Dickerson 1, D. Berleant 1, Z. Cox 1, W. Qi 1, and E. Wurtele 2 Iowa State University, Ames, IA, 50011 Abstract: This paper

More information

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive

More information

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter

More information

Natural Language Processing and Systems Biology

Natural Language Processing and Systems Biology Natural Language Processing and Systems Biology K. Bretonnel Cohen and Lawrence Hunter Center for Computational Pharmacology, University of Colorado School of Medicine, Denver, USA. E-mail: {kevin.cohen,

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

Exposé for Diploma Thesis. Joint Extraction of Proteins and Bio-Molecular Events using Imperatively Defined Factor Graphs

Exposé for Diploma Thesis. Joint Extraction of Proteins and Bio-Molecular Events using Imperatively Defined Factor Graphs Exposé for Diploma Thesis Joint Extraction of Proteins and Bio-Molecular Events using Imperatively Defined Factor Graphs Tim Rocktäschel Humboldt-Universität zu Berlin

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Identifying and extracting malignancy types in cancer literature

Identifying and extracting malignancy types in cancer literature Identifying and extracting malignancy types in cancer literature Yang Jin 1, Ryan T. McDonald 2, Kevin Lerman 2, Mark A. Mandel 4, Mark Y. Liberman 2, 4, Fernando Pereira 2, R. Scott Winters 3 1, 3,, Peter

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

The INFUSIS Project Data and Text Mining for In Silico Modeling

The INFUSIS Project Data and Text Mining for In Silico Modeling The INFUSIS Project Data and Text Mining for In Silico Modeling Henrik Boström 1,2, Ulf Norinder 3, Ulf Johansson 4, Cecilia Sönströd 4, Tuve Löfström 4, Elzbieta Dura 5, Ola Engkvist 6, Sorel Muresan

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*

LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM* LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM* Jonathan Yamron, James Baker, Paul Bamberg, Haakon Chevalier, Taiko Dietzel, John Elder, Frank Kampmann, Mark Mandel, Linda Manganaro, Todd Margolis,

More information

PerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang

PerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang PerCuro-A Semantic Approach to Drug Discovery Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang Towards the fulfillment of the course Semantic Web CSCI 8350 Fall 2003 Under

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments Grzegorz Dziczkowski, Katarzyna Wegrzyn-Wolska Ecole Superieur d Ingenieurs

More information

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

Election of Diagnosis Codes: Words as Responsible Citizens

Election of Diagnosis Codes: Words as Responsible Citizens Election of Diagnosis Codes: Words as Responsible Citizens Aron Henriksson and Martin Hassel Department of Computer & System Sciences (DSV), Stockholm University Forum 100, 164 40 Kista, Sweden {aronhen,xmartin}@dsv.su.se

More information

By Jonathan Clark, Loosdrecht, The Netherlands, (c) Publishing Research Consortium 2012

By Jonathan Clark, Loosdrecht, The Netherlands, (c) Publishing Research Consortium 2012 By Jonathan Clark, Loosdrecht, The Netherlands, (c) Publishing Research Consortium 2012 The Publishing Research Consortium (PRC) is a group representing publishers and societies supporting global research

More information

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October 06-07 2008

Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October 06-07 2008 Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation D. POLVERARI, CTO October 06-07 2008 Data integration definition and aims Definition : Data integration consists

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Semantic MEDLINE: An advanced information management application for biomedicine

Semantic MEDLINE: An advanced information management application for biomedicine Information Services & Use 31 (2011) 15 21 15 DOI 10.3233/ISU-2011-0627 IOS Press Semantic MEDLINE: An advanced information management application for biomedicine Thomas C. Rindflesch, Halil Kilicoglu,

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100

More information

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers. org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank

More information

Beyond Health 2.0: the semantic web and intelligent systems

Beyond Health 2.0: the semantic web and intelligent systems Beyond Health 2.0: the semantic web and intelligent systems Erik van Mulligen PhD Marc Weeber PhD Ravi Kalaputapu PhD Erasmus University Medical Center, Rotterdam, The Netherlands Knewco Inc, New York,

More information

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Parsing Software Requirements with an Ontology-based Semantic Role Labeler Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR)

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR) Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR) Enable Science in silico & Provide the Right Knowledge to the Right People at the Right Time to enable the

More information

From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files

From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted

More information

Computational Drug Repositioning by Ranking and Integrating Multiple Data Sources

Computational Drug Repositioning by Ranking and Integrating Multiple Data Sources Computational Drug Repositioning by Ranking and Integrating Multiple Data Sources Ping Zhang IBM T. J. Watson Research Center Pankaj Agarwal GlaxoSmithKline Zoran Obradovic Temple University Terms and

More information

Resolving Common Analytical Tasks in Text Databases

Resolving Common Analytical Tasks in Text Databases Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,

More information