Building a Large Scale Lexical Ontology for Portuguese
|
|
|
- Archibald Wells
- 10 years ago
- Views:
Transcription
1 Building a Large Scale Lexical Ontology for Portuguese Nuno Seco Linguateca Node of Coimbra
2 Agenda Motivations Goals Ontology Extraction Ontology Evaluation Study the Systematicity of Polysemy in the Lexicon using the ontology. What has been done so far
3 Motivation Communication (in natural language) is a knowledge hungry task. Grammatical knowledge (e.g., SVO, VSO, ) Cultural knowledge Common sense knowledge If computers are to do NLP they need knowledge.
4 Motivation Some properties complicate the automatic processing: Metaphorical nature Context dependent Vagueness Creative Diachronic but these properties are the result of human usage, and makes language use easy by humans!
5 Motivation So what we need is a resource* that can be used by a machine and makes explicit the effect of these properties. A Lexical Ontology for Portuguese * Be aware as this is only a snapshot of the language in a particular point in time.
6 Motivation Two strategies are usually followed: Manual construction WordNet Cyc HowNet (Semi) Automatic construction MindNet KnowItAll PAPEL (Palavras Associadas Porto Editora Linguateca)
7 Motivation So what can be done with a lexical ontology? Information Retrieval Machine Translation Question Answering Semantic Similarity Judgments Concept Creation / Explanation
8 Goals Extract the semantic organization of the pt. lexicon. (Ontology Learning, Information Extraction). Evaluate the knowledge extracted defining a methodology. Study the specific issue of systematic polysemy in Portuguese. Compare our model to other models of the Portuguese language (WordNet.PT and WordNet.BR). Make the resource publicly available.
9 Extracting the Structure of the Lexicon Can be thought of as a reverse engineering process.
10 What relations? Hyponymy; Hyperonymy Saxofone - instrumento musical de sopro, feito de metal, recurvo, com chaves e embocadura de palheta is_a(saxofone, instrumento musical) Meronymy; Holonomy rim orgão que tem a a função de orgão cada uma das partes do corpo is_a(rim, orgão) & part_of(orgão, body) -> part_of(rim, body)
11 What relations (cont d)? Synonymy permutar trocar; syn(permutar, trocar) Antonymy infeliz o que não é feliz ant(infeliz, feliz) iracional não racional ant(iracional, racional) Morphological processing: infeliz = in + feliz descontente = des + contente
12 What relations (cont d)? Causation matar - causar a morte a causa(matar, morte) Entailment ressonar - respirar com ruído durante o sono sono estado de quem dorme entails(ressnonar, dormir) Cross part-of-speech relations informatização - acto ou efeito de informatizar nominalization(informatizar, informatização)
13 Extracting the Structure of the Lexicon Árvore -- planta lenhosa que pode atingir grandes alturas e cujo tronco se ramifica na parte superior árvore (tree) => planta lenhosa (woody plant) => organismo (organism) => ser vivo (living thing) => ente (entity)
14 Structure the Lexicon (Simple English example) Tree -- a tall perennial woody plant having a main trunk and branches forming a distinct elevated crown; includes both gymnosperms and angiosperms. tree => woody plant => vascular plant => plant => organism => living thing => physical object => entity Taken from WordNet 2.1
15 Ontology Evaluation Evaluation has received very little attention!! But still, we can identify 4 core kinds: The use of a golden collection Evaluate the output of some ontology driven process Compare the ontology with clusters generated from corpora Human evaluation
16 Using a Golden Collection A Golden Collection B Where is the best output? C Lexical and Relational alignment
17 Using a Golden Collection (cont d) At the lexical level (terms in common) Precision, Recall, F-Measure,... O1 O2 2 O2 O Pr = Abr = O 1 O O 1 O O O 2
18 Using a Golden Collection (cont d) At the relational (hyperonymy/hyponymy) level (Maedche et al., 2002) Animal Animal Mamífero Réptil Mamífero Réptil Ruminante Carnívoro Gato Cão Cão Gato Cocker TO( cão,o 1,O2 ) = 3 5
19 Evaluate the Output of an Ontology Dependent Application A B Where is the best output? C Ontology Dependent Application
20 Evaluate the Output of an Ontology Dependent Application (cont d) Semantic similarity computations using ontologies and correlating them with human judgments. Performing query expansion in information retrieval systems. Knowledge Discovery and Management Group
21 Use clustering strategies (coarse evaluation) A B Where is the best output? C Well known (and acknowledged) algorithms for clustering
22 Use clustering strategies (coarse evaluation) Brewster et al., 2004 Domain A Topic 1 Topic 2 Domain A Topic 3 Topic 4
23 Human evaluation A B C
24 Human Evaluation (cont d) In order to ease the evaluators task, one could show the definitions for each (new) concept in the ontology. (Navigli et al.): festival a day or period of time set aside for feasting and celebration jazz a style of dance music popular in the 1920s; similar to New Orleans jazz but played by large bands jazz festival a kind of festival, a day or period of time set aside for feasting and celebration, related to jazz, a style of dance music popular in the 1920s
25 How can I evaluate my work? Manual Inspection! Compare to other resources being constructed: Luís Sarmento (Linguteca, Porto) extracting relations from corpora. Marcírio Chaves (Linguteca, Lisboa) creating e geographical ontology. Feed the ontology to ongoing projects: AI Lab - ReBuilder Linguateca, Oslo - Esfinge.
26 Word senses: Polysemy vs. Homonymy An individual word or phrase that can be used (in different contexts) to express two or more different meanings. Polysemy - senses are related in some way (complementary). School starts at 8:30. The School was founded in 1910 Homonymy - senses are unrelated (contrastive). The bank has several offices. We walked along the bank of the river.
27 Systematic Polysemy Polysemy of word A with meanings a i and a j is regular [systematic] if there exists at least one other word B with meanings b i and b j which are semantically distinguished from each other in exactly the same way as a i and a j and if a i and b i, and a j and b j are nonsynonymous. Ju. Apresjan (1974)
28 Some examples Habitante/Língua (Habitant/Language) norueguês, português, escocês, (68) Fabricante/Vendedor (Producer/Seller) pasteleiro, ourives, queijeiro, (57) Abertura/Acto (Opening/Act) vista, entrada, perfuração,... (11)
29 Role of Systematic Polysemy Acknowledging the systematic nature of polysemy and its relationship to underspecified representations allows one to structure ontologies for semantic processing more efficiently, generating more appropriate interpretations within context Paul Buitelaar (1998)
30 Progress so far Studying the physical format of the dictionary of Porto Editora, Dicionário da Língua Portuguesa. Looking for frequent patterns, indicative of interesting relations. Parsing the definitions using some of these patterns to obtain a taxonomic structure to the lexicon. Preliminary mining of systematic polysemy patterns.
31 Building a Large Scale Lexical Ontology for Portuguese Nuno Seco Linguateca Node of Coimbra
32 The Dictionary in Numbers Porto Editora s Dictionary (open class words) Number of entries: Nouns Verbs Adjectives Adverbs Number of senses: Nouns Verbs Adjectives Adverbs
33 The Dictionary in Numbers Frequent patterns in noun definitions: acto ou efeito de (3851) pessoa que (1386) indivíduo (1235) aquele que (1148) parte (1052) conjunto de (1004)
34 The Dictionary in Numbers Frequent patterns in verbs definitions: fazer (1680) tornar (1359) tirar (744) pôr (674) causar (299) estar (284)
35 The Dictionary in Numbers Frequent patterns in adjective definitions: que tem (2698) que ou aquele que (1393) relativo a/ao/à ( ) relativo ou pertencente (647) que ou o que (527) que diz respeito (494)
36 The Dictionary in Numbers Frequent patterns in adverb definitions: de modo (393) de maneira (48) do ponto de vista (28) por meio de (14)
37 Some difficult issues Finding the right sense of word in the definition: arquibancada banco grande cujo assento What sense of banco? Circularity: passagem transição de um transição passagem que comporta
38 Complementary Studies árvore (tree) => planta lenhosa (woody plant) => organismo (organism) => ser vivo (living thing) => ente (entity) Extracted from pt dictionary tree => woody plant => vascular plant => plant => organism => living thing => physical object => entity Taken from WordNet 2.1
PAPEL: A Dictionary-Based Lexical Ontology for Portuguese
PAPEL: A Dictionary-Based Lexical Ontology for Portuguese Hugo Gonçalo Oliveira 1, Diana Santos 2,PauloGomes 1, and Nuno Seco 1 1 Linguateca, Coimbra node, DEI - FCTUC, CISUC, Portugal 2 Linguateca, Oslo
ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches
J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria 105 A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Natural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
Human Language Technology Research and the Development of the Brazilian Portuguese Wordnet
1 Human Language Technology Research and the Development of the Brazilian Portuguese Wordnet Bento Carlos DIAS-DA-SILVA Faculdade de Ciências e Letras, Universidade Estadual Paulista, Rodovia Araraquara-Jau
Using WordNet.PT for translation: disambiguation and lexical selection decisions
INTERNATIONAL JOURNAL OF TRANSLATION Vol. XX, No. XX, XXXX XXXX Using WordNet.PT for translation: disambiguation and lexical selection decisions University of Lisbon, Portugal ABSTRACT Wordnets are extensively
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary. Anja Drame TermNet
TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary Anja Drame TermNet Summary/ Conclusion Variety of language (GPL = general purpose SPL = special purpose) Lexicography GPL SPL (special-purpose
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
EuroRec Repository. Translation Manual. January 2012
EuroRec Repository Translation Manual January 2012 Added to Deliverable D6.3 for the EHR-Q TN project EuroRec Repository Translations Manual January 2012 1/21 Table of Content 1 Property of the document...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software
Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?
What s in a Lexicon What kind of Information should a Lexicon contain? The Lexicon Miriam Butt November 2002 Semantic: information about lexical meaning and relations (thematic roles, selectional restrictions,
The XLDB Group at CLEF 2004
The XLDB Group at CLEF 2004 Nuno Cardoso, Mário J. Silva, and Miguel Costa Grupo XLDB - Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa {ncardoso, mjs, mcosta} at xldb.di.fc.ul.pt
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Converging Web-Data and Database Data: Big - and Small Data via Linked Data
DBKDA/WEB Panel 2014, Chamonix, 24.04.2014 DBKDA/WEB Panel 2014, Chamonix, 24.04.2014 Reutlingen University Converging Web-Data and Database Data: Big - and Small Data via Linked Data Moderation: Fritz
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
WHY TRANSLATOR-FRIENDLY?
OUTLINE ReEscreve Why translator friendly? Why multi-purpose? Current characteristics and the future? Motivation for its development Importance of linguistic quality control Usefulness of style editing
INFORMING A INFORMATION DISCOVERY TOOL FOR USING GESTURE
INFORMING A INFORMATION DISCOVERY TOOL FOR USING GESTURE Luís Manuel Borges Gouveia Feliz Ribeiro Gouveia {lmbg, fribeiro}@ufp.pt Centro de Recursos Multimediáticos Universidade Fernando Pessoa Porto -
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations Bento C. Dias-da-Silva 1, Ariani Di Felippo 2, Ricardo Hasegawa 3 Centro de Estudos Lingüísticos
Relations Extracted from a Portuguese Dictionary: Results and First Evaluation
Relations Extracted from a Portuguese Dictionary: Results and First Evaluation Hugo Gonçalo Oliveira 1, Diana Santos 2, Paulo Gomes 1 [email protected], [email protected], [email protected] 1 CISUC,
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Computer Standards & Interfaces
Computer Standards & Interfaces 35 (2013) 470 481 Contents lists available at SciVerse ScienceDirect Computer Standards & Interfaces journal homepage: www.elsevier.com/locate/csi How to make a natural
Natural Language Interfaces to Databases: simple tips towards usability
Natural Language Interfaces to Databases: simple tips towards usability Luísa Coheur, Ana Guimarães, Nuno Mamede L 2 F/INESC-ID Lisboa Rua Alves Redol, 9, 1000-029 Lisboa, Portugal {lcoheur,arog,nuno.mamede}@l2f.inesc-id.pt
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
A Case Study of Question Answering in Automatic Tourism Service Packaging
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0045 A Case Study of Question
INGLÊS. Aula 13 DIRECT AND INDIRECT SPEECH
INGLÊS Aula 13 DIRECT AND INDIRECT SPEECH Direct(Quoted) And Indirect(Reported) Speech Você pode responder esta pergunta: "What did he/she say?" de duas maneiras: - Repetindo as palavras ditas (direct
Intro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
Thesis Proposal Verb Semantics for Natural Language Understanding
Thesis Proposal Verb Semantics for Natural Language Understanding Derry Tanti Wijaya Abstract A verb is the organizational core of a sentence. Understanding the meaning of the verb is therefore key to
Hyponymy Extraction and Web Search Behavior Analysis Based On Query Reformulation
Hyponymy Extraction and Web Search Behavior Analysis Based On Query Reformulation Rui P. Costa and Nuno Seco Cognitive and Media Systems Group, CISUC University of Coimbra, Portugal [email protected],
Text Mining: The state of the art and the challenges
Text Mining: The state of the art and the challenges Ah-Hwee Tan Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore 119613 Email: [email protected] Abstract Text mining, also known as text data
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE
SPAN 100/101 ELEMENTARY SPANISH COURSE OBJECTIVES This Spanish course pays equal attention to developing all four language skills (listening, speaking, reading, and writing), with a special emphasis on
Automated Extraction of Security Policies from Natural-Language Software Documents
Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,
Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu
Constructing a Generic Natural Language Interface for an XML Database Rohit Paravastu Motivation Ability to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
The Battle for the Future of Data Mining Oren Etzioni, CEO Allen Institute for AI (AI2) March 13, 2014
The Battle for the Future of Data Mining Oren Etzioni, CEO Allen Institute for AI (AI2) March 13, 2014 Big Data Tidal Wave What s next? 2 3 4 Deep Learning Some Skepticism about Deep Learning Of course,
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk
Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,
AUTOMATIC DATABASE CONSTRUCTION FROM NATURAL LANGUAGE REQUIREMENTS SPECIFICATION TEXT
AUTOMATIC DATABASE CONSTRUCTION FROM NATURAL LANGUAGE REQUIREMENTS SPECIFICATION TEXT Geetha S. 1 and Anandha Mala G. S. 2 1 JNTU Hyderabad, Telangana, India 2 St. Joseph s College of Engineering, Chennai,
Extending a Lexicon of Portuguese Nominalizations with Data from Corpora
Extending a Lexicon of Portuguese Nominalizations with Data from Corpora Cláudia Freitas 1 Valeria de Paiva 2 Alexandre Rademaker 4,5 Gerard de Melo 3 Livy Real 4 Anne Silva 1 PUC-Rio Nuance Comm. Tsinghua
Grammar Presentation: The Sentence
Grammar Presentation: The Sentence GradWRITE! Initiative Writing Support Centre Student Development Services The rules of English grammar are best understood if you understand the underlying structure
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Chapter 8 The Enhanced Entity- Relationship (EER) Model
Chapter 8 The Enhanced Entity- Relationship (EER) Model Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Outline Subclasses, Superclasses, and Inheritance Specialization
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
Learn-Portuguese-Now.com presents... 100 PHRASES. What Did You Say? by Charlles Nunes
Learn-Portuguese-Now.com presents... English-Portuguese Flash Cards 100 PHRASES What Did You Say? by Charlles Nunes English-Portuguese Flash Cards 100 Phrases Congratulations! By downloading this volume
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus Dr. Tuğba YILDIZ Assist. Prof. Dr. Savaş YILDIRIM Assoc. Prof. Dr. Banu DİRİ İSTANBUL BİLGİ UNIVERSITY YILDIZ TECHNICAL UNIVERSITY
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Week 3. COM1030. Requirements Elicitation techniques. 1. Researching the business background
Aims of the lecture: 1. Introduce the issue of a systems requirements. 2. Discuss problems in establishing requirements of a system. 3. Consider some practical methods of doing this. 4. Relate the material
ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION
ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION FLORENCE REEDER The MITRE Corporation 1 / George Mason University [email protected] ABSTRACT Developing a system that accurately produces
How to work with a video clip in EJA English classes?
How to work with a video clip in EJA English classes? Introduction Working with projects implies to take into account the students interests and needs, to build knowledge collectively, to focus on the
Information extraction from texts. Technical and business challenges
Information extraction from texts Technical and business challenges Overview Mentis Text mining field overview Application: Information Extraction Motivation & Overview Page 2 Mentis - Overview Consulting
TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments
TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments Grzegorz Dziczkowski, Katarzyna Wegrzyn-Wolska Ecole Superieur d Ingenieurs
It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking
It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking Lutz Maicher and Benjamin Bock, Topic Maps Lab at University of Leipzig,
Survey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
INTEGRATED DEVELOPMENT ENVIRONMENTS FOR NATURAL LANGUAGE PROCESSING
INTEGRATED DEVELOPMENT ENVIRONMENTS FOR NATURAL LANGUAGE PROCESSING Text Analysis International, Inc. 1669-2 Hollenbeck Ave. #501 Sunnyvale, California 94087 October 2001 INTRODUCTION From the time of
Information Technology for KM
On the Relations between Structural Case-Based Reasoning and Ontology-based Knowledge Management Ralph Bergmann & Martin Schaaf University of Hildesheim Data- and Knowledge Management Group www.dwm.uni-hildesheim.de
Find the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
PiQASso: Pisa Question Answering System
PiQASso: Pisa Question Answering System Giuseppe Attardi, Antonio Cisternino, Francesco Formica, Maria Simi, Alessandro Tommasi Dipartimento di Informatica, Università di Pisa, Italy {attardi, cisterni,
How to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
The Value of Taxonomy Management Research Results
Taxonomy Strategies November 28, 2012 Copyright 2012 Taxonomy Strategies. All rights reserved. The Value of Taxonomy Management Research Results Joseph A Busch, Principal What does taxonomy do for search?
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano
Large-Scale PatternBased Information
Sebastian Blohm Large-Scale PatternBased Information Extraction from the World Wide Web Sebastian Blohm Large-Scale Pattern-Based Information Extraction from the World Wide Web Large-Scale Pattern-Based
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
SUNY PURCHASE ONLINE BASIC SPANISH I SPA 1010 SYLLABUS
SUNY PURCHASE ONLINE BASIC SPANISH I SPA 1010 SYLLABUS Prof. Deborah K. Symons Contact: [email protected] Office Hours: TBA in Moodle. ONLINE E-TEXT - Required: Interactive E-Book ENLINEA v.3.0.
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Empirical Machine Translation and its Evaluation
Empirical Machine Translation and its Evaluation EAMT Best Thesis Award 2008 Jesús Giménez (Advisor, Lluís Màrquez) Universitat Politècnica de Catalunya May 28, 2010 Empirical Machine Translation Empirical
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
