Senso Comune. a Community Knowledge Base for the Italian Language. Creative Commons Attribution-Share Alike 2.5 Italy License



Similar documents
AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE

La Ghigliottina: The game Architecture La Ghigliottina: Demo Practical Applications and Future Work. La Ghigliottina

Electronic Critical Edition of Ancient Digital Manuscript Sources

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Semantic annotation of requirements for automatic UML class diagram generation

Representation of the European Commission in Italy - Rome, Via IV Novembre, 149 LOCATION: Spazio Europa (ground floor) II FORUM

PoS-tagging Italian texts with CORISTagger

REPORT ON INTEGRATED RESEARCH ACTIVITIES OF RT4 (RG6)

Eclipse-IT rd Italian Workshop on Eclipse Technologies

Semantic Variability Modeling for Multi-staged Service Composition

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

A Service Modeling Approach with Business-Level Reusability and Extensibility

LUISS Summer School on European & Comparative Company Law

CONTRIBUTORS. Alfred Abuhamad, MD Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA, USA

Using NLP and Ontologies for Notary Document Management Systems

PRIVATE AND PUBLIC PRIVATE GLOBAL REGULATION: GLOBAL ADMINISTRATIVE LAW DIMENSIONS. VITERBO, June PARTICIPANTS

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

Academic Curriculum vitae

Semantic Integration: A Survey Of Ontology-Based Approaches

Evaluation experiment for the editor of the WebODE ontology workbench

Ontology-based Web Service Composition: Part 1. Rolland Brunec Betreuerin: Sabine Maßmann Universität Leipzig, Abteilung Datenbanken

LUISS Summer School. LUISS Summer School. on European & Comparative. on European & Comparative

Finite Model Reasoning on UML Class Diagrams via Constraint Programming

Overview of MT techniques. Malek Boualem (FT)

Multilingual and Localization Support for Ontologies

Defining Equity and Debt using REA Claim Semantics

SMART CITY CHALLENGES : PLANNING FOR SMART CITIES. DEALING WITH NEW URBAN CHALLENGES

IIR Proceedings of the Fourth Italian Information Retrieval Workshop. Roberto Basili, Fabrizio Sebastiani, Giovanni Semeraro (Eds.

Born on October 30, 1967, in Rome Married to Silvia, with two children: Livia (7), and Penelope (5).

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

Proceedings of the. 6th International Conference on Generative Approaches to the Lexicon

Distributed Database for Environmental Data Integration

EUROPEAN SUMMER SCHOOL ON SOCIAL ECONOMY

INVEST YOUR TALENT IN ITALY

Software Engineering of NLP-based Computer-assisted Coding Applications

Semantics and Ontology of Logistic Cloud Services*

How To Attend The Luiss Summer School On European And Comparative Law: Capital Markets

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

BRIXEN WORKSHOP & SUMMER SCHOOL ON INTERNATIONAL TRADE AND FINANCE

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

LL.M. Program Business and Company Law: European and International Perspectives LUISS Guido Carli School of Law Academic Year 2013/2014

Amit Sheth & Ajith Ranabahu, Presented by Mohammad Hossein Danesh

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Foodscapes. Creativity, innovation and sustainability strategies in the food and wine tourism sector. First Summer Workshop in Tourism Geography

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS

Model Driven Interoperability through Semantic Annotations using SoaML and ODM

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Clinical and research data integration: the i2b2 FSM experience

Professor of Mathematical Finance Office phone: Italy

WebLicht: Web-based LRT services for German

Context Model Based on Ontology in Mobile Cloud Computing

Title: Chinese Characters and Top Ontology in EuroWordNet

CARLA CONTEMORI Curriculum Vitae January 2010

Supporting FrameNet Project with Semantic Web technologies

Prof. Elisabetta Cerbai University of Florence, Vice-Chancellor for Research Sapienza University of Rome, NVA Coordinator

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

A UNIQUE Ph. D. PROGRAMME IN MONEY AND FINANCE

A Tutorial on Data Integration

BRIXEN WORKSHOP & SUMMER SCHOOL ON INTERNATIONAL TRADE AND FINANCE

Who can attend the workshop. Architects, engineers, technicians, technical promoters are allowed to attend the course. Program

NOME ROLE DEGREE TEACHING AREA INSTITUTION. Chiara Tonelli Faculty Advisor Aggregate Professor Environmental Design Architecture Department Roma Tre

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

INTELLIGENT ELECTRICAL NETWORKS: A KEY ELEMENT IN THE DEVELOPMENT OF NEW MODELS OF GROWTH

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

Europass Curriculum Vitae

CURRICULUM VITAE ET STUDIORUM

Transcription:

Senso Comune a Community Knowledge Base for the Italian Language Creative Commons Attribution-Share Alike 2.5 Italy License by

Introduction Senso Comune is an ongoing project to build open knowledge base of the Italian language Collaborative research initiative freely supported by a multi-disciplinary community Univ. of Rome Sapienza and Tor Vergata, Bologna, Bolzano, Pavia, Trento, et al. Italian National Research Council (CNR) ISTC, ILC Fondazione Bruno Kessler, Trento IBM Center for Advanced Studies Italy Non-profit organization led by Prof. Tullio De Mauro 2 07/06/2012 2

Objectives Build an open, collaborative knowledge base of the Italian language (i.e. a possibly incomplete database with a schema specification that allows automated reasoning) Collect information from both dictionary sources and skilled people (scholars, researchers, practitioners, etc) Formalize linguistic knowledge Morphological and lexical information Semantic specifications through ontologies Thematic roles and frames Develop a specific platform for linguistic knowledge acquisition Distribute open, standardized linguistic data 3 07/06/2012 3

Approach Start from the core lexicon of De Mauro s dictionary ~2,000 most common Italian lemmas (90% coverage) ~13,000 senses Allow (qualified) users to enrich / modify the content New lemmas, senses, usage instances, lexical relations, etc Ontological classification of each sense Complete the coverage ~100,000 lemmas ~200,000 senses, including technical ones 4 07/06/2012 4

Lexicon and Ontology Specific focus on the lexicon - ontology interplay [Oltramari and Vetere, 2008] To what extent linguistic senses bear ontological commitments? Our position: Linguistic constructs truth-valued logic constructs Linguistic senses and ontology concepts are (in principle) disjoint A (partial) mapping function leads from senses to concepts Lexical relationships (e.g. synonymy, hyponymy) are not imediately (nor necessarily) reflected in ontological axioms (e.g. equivalence, inclusion) Main differences w.r.t. WordNet A-priori ontological backbone Clear distinction between senses and concepts Formal and focused account of the conceptual level Elements of frame semantics 5 07/06/2012 5

The Ontology Behind Senso Comune Inherits from DOLCE (CNR ISTC) Nominalistic subset (no universals) Reified classes and relationships (characterizations) Modules Foundational ontology Morpho-syntactic structure Semasiological structure Semantic Relations and frames 6 07/06/2012 6

Semasiological Model Abstract Entity Concrete Entity Information Object Characterization Substance Action part lexical relation gram.spec. Lemma Meaning Record Linguistic Property Meaning Expression definition Water { Noun } categorization gram.spec. A liquid etc. definition part Water-1 mapping punning Drink- Meaning and characterizes only Action Water-1 Meaning and characterizes only Substance usage instance The boy drinks a glass of water annotations Drink-# Water-# generated from categorized meaning records 7 07/06/2012 7

Semasiological Model Separating linguistic senses and relationships (e.g. synonymy, hyponymy, and antinomy) from their ontological counterparts (e.g. inclusion, disjointness) is at the basis of our model. This separation prevents linguistic facts to be directly mapped to logic propositions, thus relieves linguistic meanings the burden of embodying ontological commitments [Vetere and Oltramari, 2008] We distinguish between meanings as registered in dictionaries from the concepts they refer to (if any). The former are instances of the class MeaningRecord (InformationObject), while the latter are subclasses of Meaning (Characterization). Basically, MeaningRecords are instantiated in Dictionaries, while Meaning classes are instantiated in linguistic acts/texts. Annotating a MeaningRecord instance with an Ontology class amounts at introducing a Meaning subclass which is restricted to that class. Mapping between MeaningRecords (instances) and Meanings (classes) can be done by annotations, punning, etc. In any case, formal semantics of mappings can be specified in different ways. Lexical relations are predicated on MeaningRecords; hence they do not have a direct ontological import. Any correspondence (e.g. hyponymy > inclusion) should be introduced based on suitable heuristics. Also, attributes of MeaningRecords instances (e.g. glosses, grammatic features, usage marks, rethoric marks, ethimology, etc) do not affect the mapped Meaning class (if any). Different MeaningRecords instances (e.g. from different dictionaries) may be mapped to the same Meaning class. This way, the model may accommodate meaning records coming from different sources, that might use different sets of attributes (e.g. different usage marks). 8 07/06/2012 8

Supporting Senses Classification with TMEO [Oltramari, 2010] 9 07/06/2012 9

Technicalities Description Logic underlying all modules Formal semantics and decidability Well understood computability / expressiveness Compliance with standards of knowledge representation and automated reasoning Native OWL 2 specification http:///ontologies/sensocomune.owl http:///ontologies/sensocomunelexicon.owl http:///ontologies/sensocomunesemantics.owl UML derivative models to map with Java and Relational DBMS 10 07/06/2012 10

Architecture Web Rich Client (Ajax) Apps API Platform Ontologies Knowledge Base Log 11 07/06/2012 11

Screenshots 12 07/06/2012 12

An Experiment About 4,500 core substantival senses were classified by students and supervisors Identifying the ontological commitment in linguistic senses turned out to be hard in many cases Confidence of classifications was rated 59% accepted 33% controversial 8% rejected Most controversial concept: SOCIAL_OBJECT [Chiari, Oltramari and Vetere, 2010] 13 07/06/2012 13

Work in Progress Ontology of Semantic Relations and Frames Goals Provide a formal (DL) characterization Represent users linguistic knowledge Support NLP tasks with efficient reasoning Issues Cope with higher-order features Syntactic-Semantic binding requires coreference? 14 07/06/2012 14

References G. Vetere, 2008: Verso un lessico computazionale aperto per la lingua italiana, PAAL2008 A. Oltramari, G. Vetere, 2008: Lexicon and Ontology Interplay in Senso Comune, OntoLex2008 A. Oltramari, G. Vetere, 2008: Acquiring Italian Linguistic Knowledge with Senso Comune, AI*IA 2008 I. Chiari, A. Oltramari e G. Vetere, 2011: Di cosa parliamo quando parliamo fondamentale? in Atti del Convegno della Società di linguistica italiana, Viterbo A. Oltramari, 2010: TMEO, tutoring methodology for the enrichment of ontologies, LREC 2010, 17-23 May, La Valletta, Malta. A. Oltramari, G. Vetere, M.Lenzerini, A.Gangemi, N.Guarino, 2010: Senso Comune. Proc. of LREC 2010 (7th International Conference on Language Resources and Evaluation), 17-23 May, La Valletta, Malta. G.Vetere, A.Oltramari, I.Chiari, E.Jezek, L.Vieu, F.M.Zanzotto Senso Comune, an Open Knowledge Base of Italian Language, Revue TAL 2012 (to appear) 15 07/06/2012 15

Credits Project coordination: Guido Vetere (IBM) Linguistics: Isabella Chiari (Uni Roma Sapienza, coord.), Elisabetta Jezek (Uni Pavia), Fabio M. Zanzotto (Uni Roma Tor Vergata) Logic and Ontology: Diego Calvanese (Uni Bolzano), Nicola Guarino, Aldo Gangemi (CNR ISTC), Maurizio Lenzerini (Uni Roma Sapienza), Alessandro Oltramari (Carnegie Mellon University), Guido Vetere (IBM) Design and Development: Alessandro Faraotti (coord.), Daniele Chermaz, Ilaria Gorga, Michele Minno, Fabrizio Smith, Giuliano Iacobelli, Piero Cangialosi, Andrea Mencancini,Carlo Ferrarini (IBM) Resource Development: Rita Plantera (coord.), Silvia Castagna, Silvia Coltellacci, Alice Paesetto, Sara Perboni, Fabio Celli, Annapaola Montini, Romina Vinci, Eva Brugnettini, Andrea Zaninello, Lorena Mascara, Nicola Amabile, Valentina Arena, Valentina Cristini, Marina D'Auria, Flavio De Giusti, Valentina Di Marco, Angela Napoleone, Federico Riccardi, Marco Scarino, Tiziana Taboga, Edoardo Vanni (Uni Roma Sapienza) Founders Tullio De Mauro Aldo Gangemi Nicola Guarino Maurizio Lenzerini Malvina Nissim Guido Vetere Board Tullio De Mauro (President) Diego Calvanese Isabella Chiari Aldo Gangemi Nicola Guarino Alessandro Oltramari (Secretary) Guido Vetere (Vice President) 16 07/06/2012 16