Latin WordNet project



Similar documents
Package wordnet. January 6, 2016

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization

Natural Language Processing. Part 4: lexical semantics

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Learning the Question & Answer Flows

Learning the Question & Answer Flows

Learning the Question & Answer Flows

Language Lessons. Secondary Child

A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches

MODULE 15 Diagram the organizational structure of your company.

Livingston Public Schools Scope and Sequence K 6 Grammar and Mechanics

12 FIRST QUARTER. Class Assignments

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Monday Simple Sentence

Parent Help Booklet. Level 3

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

GUESSING BY LOOKING AT CLUES >> see it

Year 3 Grammar Guide. For Children and Parents MARCHWOOD JUNIOR SCHOOL

Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base

Level 1 Teacher s Manual

Building a Question Classifier for a TREC-Style Question Answering System

ONLINE ENGLISH LANGUAGE RESOURCES

ISSN: Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education

Grammar Presentation: The Sentence

Rubrics & Checklists

BUSINESS COMMUNICATION. Competency: Grammar Task: Use a verb that correctly agrees with the subject of a sentence.

SEPTEMBER Unit 1 Page Learning Goals 1 Short a 2 b 3-5 blends 6-7 c as in cat 8-11 t p

This image cannot currently be displayed. Course Catalog. Language Arts Glynlyon, Inc.

Going Faster: Testing The Web Application. By Adithya N. Analysis and Testing of Web Applications Filippo Ricca and Paolo Tonella

Rubrics & Checklists

Rubrics & Checklists

Chapter I - Passive Voice

NAME: DATE: Leaving Certificate BUSINESS: Enterprise. Business Studies. Vocabulary, key terms working with text and writing text

LINGUISTIC PROCESSING IN THE ATLAS PROJECT

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning

Acli Colf. Association of Domestic Workers - Italy 2. ACTIVITIES ON A COMMON PATH 1. INTRODUCTION. ASSOCIATION S ROLE

Student s Worksheet. Writing útvary, procvičování

The Book of Grammar Lesson Six. Mr. McBride AP Language and Composition

Third Grade Language Arts Learning Targets - Common Core

PiQASso: Pisa Question Answering System

AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

ESL 005 Advanced Grammar and Paragraph Writing

Inglés IV (B-2008) Prof. Argenis A. Zapata

The Spanish Version of WordNet 3.0

Online Tutoring System For Essay Writing

Sentence grammar quiz

Name: Note that the TEAS 2009 score report for reading has the following subscales:

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words

Interpreting areading Scaled Scores for Instruction

The Lois Project: Lexical Ontologies for Legal Information Sharing

Visualizing WordNet Structure

SYNTACTIC PATTERNS IN ADVERTISEMENT SLOGANS Vindi Karsita and Aulia Apriana State University of Malang

Nashville State Community College Business & Applied Arts Division Office Administration. Master Course Syllabus

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Building with the 6 traits

Evaluating the Elements of a Piece of Practical Writing The author of this friendly letter..

Digital performance of Italy

Statistical Machine Translation

PoS-tagging Italian texts with CORISTagger

Use Case Diagrams. Tutorial

NOUNS, VERBS, ADJECTIVES OH, MY!

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

Web opinion mining: How to extract opinions from blogs?

Pupil SPAG Card 1. Terminology for pupils. I Can Date Word

Rubrics & Checklists

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Syntax: Phrases. 1. The phrase

SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING

Automated Chat Generator

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Check, Revise, and Edit Chart

Albert Pye and Ravensmere Schools Grammar Curriculum

The information in this booklet is divided into the sections explained below.

English Language (first language, first year)

TeachingEnglish Lesson plans. Conversation Lesson News. Topic: News

A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis

TERMS. Parts of Speech

Grammar Rules: Parts of Speech Words are classed into eight categories according to their uses in a sentence.

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

La Ghigliottina: The game Architecture La Ghigliottina: Demo Practical Applications and Future Work. La Ghigliottina

English Appendix 2: Vocabulary, grammar and punctuation

Senso Comune. a Community Knowledge Base for the Italian Language. Creative Commons Attribution-Share Alike 2.5 Italy License

Why SBVR? Donald Chapin. Chair, OMG SBVR Revision Task Force Business Semantics Ltd

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

Transcription:

Latin WordNet project Stefano Minozzi Laboratorio di Informatica Umanistica Università degli Studi di Verona

Latin WordNet project Laboratorio di Informatica Umanistica Università degli Studi di Verona http://www.cyllenius.net/labium/ The Cognitive and Communication Technologies (TCC) Division Fondazione Bruno Kessler Trento http://cit.fbk.eu/en/research

Historical credits. Latin WordNet project owes to: Princeton WordNet: lexical database for the English language (was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. Development began in 1985.) MultiWordNet: a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet v. 1.6. (Developed since 1994, at Istituto Trentino di Cultura now Fondazione Bruno Kessler)

MultiWordnet: multilingual lexical matrix

In Latin WordNet are represented: Semantic parts of speech: Nouns Verbs Adjectives Adverbs Lexical relations that connect words Meanings are considered a constant through the various languages, while the lexicalization of a meaning is a language-specific variable

Structure of the database

the synset (= group of synonyms) is the building block of WordNet express an idea, etc. in words; \"He said that he wanted to marry her\"; \"tell me what is bothering you\"; \"state your opinion\" synset word synset lemma state adnuntio say dico tell effor enuntio synset word for dire inquam enunciare inseco enunziare loquor raccontare narro

The synsets are linked by relations

Relations for adjectives and adverbs

Moreover the synsets are connected with semantic field labels in order to create domain-related dictionaries

Building the semantic network

Build a semantic network from scratch is very time consuming Resources available permit a different approach: Automatic assignment of synsets Manual correction of the results

Building blocks: Latin to Italian MRD (from G. B. Conte E. Pianezzola) Latin to English MRD (from OLD, via William Whitaker's Words) Italian and English branches of MultiWordNet

We developed a number of assignment strategies Multilingual intersection method exploits multilingual nature of MultiWordNet Generic probability for very specialized words, where polisemy is really limited Gloss correspondence exploits glosses present in the MRD Intersection of synsets assigns a lemma to a synset when a number of the translation equivalents addresses to the same synset

Intersection method amor, is love, affection; the beloved; Cupid; affair; desire, passion; sexual passion; illicit passion amore; persona amata, amore; questioni amorose, amorazzi; storie d'amore;amore, desiderio; Amore;gli Amori, gli Amorini; Intersection amor, is n#04478900 Synsets from english n#05567241 n#05607724 n#05608483 n#07109169 Synsets from italian

Generic probability abactor, oris rustler, cattle_thief; one_who_drives_off SYNSET n#07541894

Gloss correspondence punctum, i point, dot; point, spot; small_hole, pin_prick; sting, small_puncture (of_insect); vote, tick; tiny_amount; full-stop, period (punctuation) PERIOD n#05126526 n#09715092 n#10843624 n#10868422 n#10869183 n#10954173 n#10961157 n#10982844 n#10988653 n#05126526 Period point full_stop stop full_point {a punctuation mark (.) placed at the end of a declarative sentence to indicate a full stop or after abbreviations}

Intersection of synsets punctum, i point, dot; point, spot; small_hole, pin_prick; sting, small_puncture (of_insect); vote, tick; tiny_amount; full-stop, period (punctuation) POINT (24 synset) DOT (2 synset) n#02582551;n#03150523;n#03150944;n#0 3151033;n#03719894;n#03720036;n#0395 8380;n#04481751;n#04514257;n#0458954 6;n#04867079;n#04955967;n#05110203;n #05126526;n#06351684;n#06745866;n#09 780630;n#09869507;n#09933792;n#09962 048;n#10018378;n#10025218;n#10044643 ;n#10898122 n#05096549 ;n#10025218

Lexical Gaps LEXICAL UNIT FREE COMBINATION abactor, is gap Latin-TO-Italian: ladro di bestiame

Dimension of the database Latin Noun Verb Adj Adv TOTAL SYNSETS 5621 2283 775 294 8973 LEMMAS 4777 2609 1259 479 9124 13060 10062 2054 732 25908 WORD SENSES

Latin WordNet can be browsed online http://multiwordnet.itc.it/english/home.php The database of Latin WordNet is available from European Language Resources Association (ELRA) http://www.elra.info/

Thank you!