EVALITA 2009. http://evalita.fbk.eu. Local Entity Detection and Recognition (LEDR) Guidelines for Participants 1



Similar documents
EVALITA Temporal Expression Recognition and Normalization Task Guidelines for Participants

EVALITA Named Entity Recognition on Transcribed Broadcast News Guidelines for Participants

Overview of the EVALITA 2009 PoS Tagging Task

The Evalita 2011 Parsing Task: the Dependency Track

Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian

EVALITA 07 parsing task

NewsReader Italian and Spanish Guidelines for Annotation at Document Level NWR

XML. CIS-3152, Spring 2013 Peter C. Chapin

PRESENTATION. - Marketing Division. October 2003

University of Modena and Reggio Emilia. Laboratorio di Comunicazioni Multimediali EVALVID. Daniela Saladino

Born on October 30, 1967, in Rome Married to Silvia, with two children: Livia (7), and Penelope (5).

De La Salle University Information Technology Center. Microsoft Windows SharePoint Services and SharePoint Portal Server 2003 READER / CONTRIBUTOR

Search and Information Retrieval

EAD and EAC in Italy and the Italian archival descriptive systems on-line

Windows XP User guide for wired network v1.1

Adobe Acrobat X Pro Forms

presents The AIE network

TOSCOT 2013NOVITàNewsNOuVelles NOTIcIas NachrIchTeN

Converting the stay permit. from study to work

SAMPLE. Course Learning Objectives and Expected Learning Outcomes

MEANTIME, the NewsReader Multilingual Event and Time Corpus

Trns port Payroll XML File Import Guide. Prepared by the Minnesota Department of Transportation (Mn/DOT)

Italian 1001 Italian Language I

NYU Department of Italian Summer 2014 Intensive Intermediate Italian Section 001 Monday-Thursday, 2:30-6:15 Paolo Campolonghi

Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network

Personal Token Software Installation Guide

comscore Day Press Conference Milan, 19 th November 2015

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

Milano (Italia) Light Nova Lighting

Tutorial for proteome data analysis using the Perseus software platform

Milano (Italia) Light Nova Lighting

Producing Accessible Slide Presentations for Scientific Lectures: a Case Study for the Italian University in the Mac OS X Environment

MTR X-SERIES bench hardness tester

Local Organizing Committee Geremia Gios, Chair Roberta Raffaelli Sandra Notaro

TELEMED ONLINE DEMO GUIDE FOR SYSTEM INSTALLATION & ONLINE DEMO

Introduction to IE with GATE

Report and Dashboard Template User Guide

Structural Health Monitoring Tools (SHMTools)

Q Results. 13 May Investor Relations

Tool Support for Model Checking of Web application designs *

Xtreeme Search Engine Studio Help Xtreeme

Interactive Dynamic Information Extraction

SPEED REAL TIME USER S MANUAL. Version

Cold Start Knowledge Base Population at TAC 2013 Task Description 1. Introduction. Version 1.1 of July 31, 2013

Converting the Stay Permit

Taleo Enterprise. Career Section Branding Definition. Version 7.5

Skills Funding Agency

Recommended Solutions for Installing Symantec Endpoint Protection 12.1.x in Shared and PvD Virtual Machines

Ranking Analysis. file://c:\programmi\web CEO\Cache\WCSE\{5CD4ADC5-1EEA-4D77-BB59-10F64F4F2CB4}\WCSE_report.htm

KaleidaGraph Quick Start Guide

XML: extensible Markup Language. Anabel Fraga

Introduction to XML Applications

IBM SPSS Direct Marketing 23

1. Digital Asset Management User Guide Digital Asset Management Concepts Working with digital assets Importing assets in

SIIV SUMMER SCHOOL 2013 PADOVA (Italy) 9 th -13 th September Innovative Research on Materials and Technologies for Transport Infrastructures

GRADUATE SCHOOL IN PUBLIC ECONOMICS (DEFAP)

EUROPEAN CURRICULUM VITAE FORMAT

Software documentation systems

AmbrosiaMQ-MuleSource ESB Integration

SIMGallery. User Documentation

Klippel Result Export to VACS AN 52

Web Document Clustering

InfiniteInsight 6.5 sp4

Excel will open with the report displayed. You can format and/or save the report as desired.

Draft IFRIC Interpretation DI/2010/1 Stripping Costs in the Production Phase of a Surface Mine

WORLD CUP 06 PENNE / ITALY NOVEMBER 2006

I Saw You in a Dream I Saw You in a Dream

Territori,theItalianWebPortalofCadastresandHistoricalCartographyCartography. Cartography. By Mauro Tosti Croce & Saverio Pialli

WIRIS quizzes web services Getting started with PHP and Java

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

17 March 2013 NIEM Web Services API Version 1.0 URI:

Managing XML Documents Versions and Upgrades with XSLT

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

SEO 101. Learning the basics of search engine optimization. Marketing & Web Services

Automation Services 9.5 Workflow Reference

Using SQL Developer. Copyright 2008, Oracle. All rights reserved.

SAP InfiniteInsight 7.0 SP1

- Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire

A DTD for Qualitative Data:

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis

dell studio xps 1645 service manual

Welcome to EMP Monitor (Employee monitoring system):

Batch Validation Tool User Guide

Studioddm snc via Malpighi, Milano - Italy. t f studioddm@studioddm.com (.

CALL FOR APPLICATION Made in Italy and Italian style Specialization Course

Calculation of the Functional Size and Productivity with the IFPUG method (CPM 4.3.1). The DDway experience with WebRatio

Programma corso di formazione J2EE

PoS-tagging Italian texts with CORISTagger

PhD program in Philosophy - CONSORTIUM PhD PROGRAM IN NORTHWEST PHILOSOPHY - FINO

EUROPEAN CURRICULUM VITAE FORMAT

FLAVIO D ANNUNZIO Digital for Business

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Technicians and Interventions Scheduling for Telecommunications

Preservation Handbook

CENTRO DI ECCELLENZA JEAN MONNET DELL UNIVERSITÀ DEGLI STUDI DI MILANO

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Working with KML (Google Earth) files in TELE System navigation software

«Software Open Source come fattore abilitante dei Progetti per le Smart Cities»

Transcription:

EVALITA 2009 http://evalita.fbk.eu Local Entity Detection and Recognition (LEDR) Guidelines for Participants 1 Valentina Bartalesi Lenzi, Rachele Sprugnoli CELCT, Trento 38100 Italy {bartalesi sprugnoli}@celct.it Task Definition The Local Entity Detection and Recognition (LEDR) task requires that entities (i.e. persons, organizations, geo-political entities and geographical locations) mentioned in source texts be detected, and that selected information about these entities be recognized, following the ACE-LDC standards with all the modifications needed to adapt them to the specific morphosyntactic features of Italian. The information comprises three attributes, i.e. type, subtype, and class (note that each entity may have only one class, one type, and one subtype). Participants should also recognise all the mentions referring to each entity. An entity, in fact, provides a representation of an object in the world, while an entity mention provides information about any textual references to that object. For instance, if Elvis Presley is mentioned in two different sentences of a text as il cantante/the singer and as egli/he, these two expressions are considered as two mentions referring to the same entity (i.e. coreferring mentions). Mentions are to be detected and output along with their attributes. More precisely, the output for each entity mention includes the mention type, its extent, the location of its syntactical head within the extent, and optionally the mention role (only for geo-political entities) and style (literal or metonymic). In the LEDR task, each document is processed separately and entities that are mentioned in different documents are treated as different entities. For a detailed description of entities, mentions and their attributes, and for the modifications needed to adapt the English guidelines to the specific morphosyntactic features of Italian, refer to the annotation report downloadable at: http://evalita.fbk.eu/ doc/annotation_report_ledr.pdf. The Corpus (I-CAB) Both training data and test data are part of the Italian Content Annotation Bank (I-CAB), 1 Please note updates in the Data Format section (differences from the previous version of the guidelines are highlighted in bold) and new examples in Appendix A and B.

developed by FBK-irst and CELCT. I-CAB consists of 525 news documents taken from the local newspaper L Adige. The selected news stories belong to four different days (September, 7th and 8th 2004 and October, 7th and 8th 2004) and are grouped into five categories: News Stories, Cultural News, Economic News, Sports News and Local News. I-CAB is divided into a development part (335 news stories, for a total of around 113,000 words) and a test part (190 news stories, for a total of around 69,000 words). Data Format Training data are distributed in the following formats: - TXT files contain the source text. All text files are in UTF-8 (see Appendix A). - APF (ACE2 Program Format) files contain the annotation in the form of XML standoff annotation, which means that the file as a whole conforms to XML encoding standards, and the raw data being annotated resides in a separate file. The annotations point to portions of the raw text via indices (see Appendix B). In order to have a clear visualization of the annotated APF files, it is possible to import them with Callisto 1.5.2 (http://callisto.mitre.org), a freely distributed annotation tool developed by the MITRE Corporation. To import the APF files follow these indications: 1. Mouse on: File -> Import. A pop-up window will appear; 2. Browse to get to the file to import; 3. Select ACE Event Task for Available Importers and ACE2004 APF v.5.1.5 ; 4. Leave the file encoding at UTF-8; 5. Press Import. Test data will be distributed in the TXT format. Requirements: The data format required for system output is the APF. For each source document in the evaluation data set, the system will have to produce a single APF file as output. The XML DTD for APF can be downloaded from the EVALITA web site: http://evalita.fbk.eu/ doc/apf.v5.2.0.zip. You can use an XML validator to verify that a system output file conforms to the ACE DTD for APF. This java implementation is downloadable from the NIST ACE web site: ftp://jaguar.ncsl.nist.gov/ace/resources/xmlvalidator.tar.gz Output folders are required to be named as follows: evalita09_ledr_participantname_run1 (or run2) Files have to be organized maintaining the same sub-folders structure (e.g. 20040907/Cultura) Output files have to maintain the same names of the input files Output folders have to be sent by e-mail both to bartalesi@celct.it and to sprugnoli@celct.it 2 Automatic Content Extraction

Evaluation Metrics The final ranking will be based on the Value score as defined in the ACE 2008 evaluation campaign. The Value score for a system is defined to be the sum of the values of all of the system s output tokens, normalized by the sum of the values of the reference data. The possible Value of a system output token depends on how closely it matches that of the reference token to which it is mapped. The maximum possible Value score is 100% but negative scores are possible for systems that make costly errors. For more information, see Automatic Content Extraction 2008 Evaluation Plan, (http://www.nist.gov/speech/tests/ace/2008/doc/ace08-evalplan.v1.2d.pdf). LEDR _ Value sys = i value _ of _ sys _ token i j value _ of _ ref _ token Parameters for scoring LEDR performance Parameters are the same used in the ACE evaluation: 0.300, minimum acceptable overlap 4 chars, max acceptable extent difference for names and mentions to match 0.750, cost for spurious entity mentions 1.000, cost for incorrect coreference As far as attributes are concerned, attribute weights are defined in the Automatic Content Extraction 2008 Evaluation Plan (page 10). Please note that, even if mentions of type NAM might be extracted from the training corpus of the Named Entities Recognition task, we strong invite participants from doing it to guarantee the correctness of the evaluation. Having confidence in the honesty of participants, we keep the same weights of ACE scorer also for mention types, as reported in the Table 1. AttributeValue Attribute Weight NAM 1.00 NOM 0.50 PRO 0.10 Table 1 Default parameters for scoring LEDR attributes Figure 1 shows an example of the evaluation output. For a brief description of the fields in each record, see Automatic Content Extraction 2008 Evaluation Plan. j Figure 1 The evaluation output

Scorer For the official evaluation we are going to use the ACE08 scoring script ace08-evalv17.pl, whose formulas are described in Automatic Content Extraction 2008 Evaluation Plan. You can download the scorer from the EVALITA web site: http://evalita.fbk.eu/doc/ace08-eval-v17.zip. Contacts Valentina Bartalesi Lenzi Tel. 0039 0461 314885 bartalesi@celct.it Rachele Sprugnoli Tel. 0039 0461 314879 sprugnoli@celct.it

Appendix A Input in the SGML format (example available at: http://evalita.fbk.eu/doc/ledrsample.zip). Le donne al buonconsiglio Penelope e le altre TRENTO - Secondo incontro, oggi alle ore 20, per approfondire i temi della mostra «Guerrieri Principi ed Eroi» al castello del Buonconsiglio di Trento. E' il ruolo della donna l'argomento centrale. Patrizia Frontini, archeologa che opera presso il Castello Sforzesco di Milano, condurrà gli intervenuti in un percorso guidato che avrà come protagonista la figura femminile dalle più antiche comunità di agricoltori e allevatori fino all'epoca dei Celti. La posizione sociale della donna sarà illustrata attraverso il commento degli oggetti esposti in mostra. L'ingresso è libero. Il prossimo incontro è in programma il 28 ottobre.

Appendix B Output in the APF format (example available at: http://evalita.fbk.eu/doc/ledrsample.zip). <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE source_file PUBLIC "SYSTEM" "apf.v5.2.0.dtd"> <source_file URI="adige20041007_id413720.txt" SOURCE="unknown" TYPE="text" VERSION="5.0" AUTHOR="Evalita" ENCODING="UTF-8"> <document DOCID="adige20041007_id413720"> <entity ID="adige20041007_id413720-E1" TYPE="PER" SUBTYPE="Group" CLASS="GEN"> <entity_mention ID="adige20041007_id413720-E1-2" TYPE="NOM" <charseq START="217" END="227">della donna</charseq> <charseq START="223" END="227">donna</charseq> <entity_mention ID="adige20041007_id413720-E1-4" TYPE="NOM" <charseq START="406" END="504">la figura femminile dalle più antiche comunità di agricoltori e allevatori fino all'epoca dei Celti</charseq> <charseq START="409" END="414">figura</charseq> <entity_mention ID="adige20041007_id413720-E1-6" TYPE="NOM" <charseq START="528" END="538">della donna</charseq> <charseq START="534" END="538">donna</charseq> <entity_mention ID="adige20041007_id413720-E1-8" TYPE="NOM" <charseq START="0" END="24">Le donne al buonconsiglio</charseq> <charseq START="3" END="7">donne</charseq> <entity ID="adige20041007_id413720-E2" TYPE="PER" SUBTYPE="Individual" CLASS="SPC"> <entity_mention ID="adige20041007_id413720-E2-2" TYPE="NAM"

<charseq START="251" END="267">Patrizia Frontini</charseq> <charseq START="251" END="267">Patrizia Frontini</charseq> <entity_mention ID="adige20041007_id413720-E2-4" TYPE="NOM" <charseq START="270" END="279">archeologa</charseq> <charseq START="270" END="279">archeologa</charseq> <entity_mention ID="adige20041007_id413720-E2-6" TYPE="PRO" <charseq START="281" END="283">che</charseq> <charseq START="281" END="283">che</charseq> <entity_mention ID="adige20041007_id413720-E2-8" TYPE="NAM" <charseq START="251" END="328">Patrizia Frontini, archeologa che opera presso il Castello Sforzesco di Milano</charseq> <charseq START="251" END="328">Patrizia Frontini, archeologa che opera presso il Castello Sforzesco di Milano</charseq> <entity_attributes> <name NAME="Patrizia Frontini"> <charseq START="251" END="267">Patrizia Frontini</charseq> <name NAME="Patrizia Frontini, archeologa che opera presso il Castello Sforzesco di Milano"> <charseq START="251" END="328">Patrizia Frontini, archeologa che opera presso il Castello Sforzesco di Milano</charseq> </entity_attributes> <entity ID="adige20041007_id413720-E3" TYPE="PER" SUBTYPE="Individual" CLASS="SPC"> <entity_mention ID="adige20041007_id413720-E3-2" TYPE="NAM"

<charseq START="27" END="34">Penelope</charseq> <charseq START="27" END="34">Penelope</charseq> <entity_attributes> <name NAME="Penelope"> <charseq START="27" END="34">Penelope</charseq> </entity_attributes> <entity ID="adige20041007_id413720-E4" TYPE="PER" SUBTYPE="Group" CLASS="GEN"> <entity_mention ID="adige20041007_id413720-E4-2" TYPE="PRO" <charseq START="38" END="45">le altre</charseq> <charseq START="41" END="45">altre</charseq> <entity ID="adige20041007_id413720-E5" TYPE="PER" SUBTYPE="Group" CLASS="GEN"> <entity_mention ID="adige20041007_id413720-E5-2" TYPE="NOM" <charseq START="456" END="466">agricoltori</charseq> <charseq START="456" END="466">agricoltori</charseq> <entity ID="adige20041007_id413720-E6" TYPE="PER" SUBTYPE="Group" CLASS="GEN"> <entity_mention ID="adige20041007_id413720-E6-2" TYPE="NOM" <charseq START="470" END="479">allevatori</charseq> <charseq START="470" END="479">allevatori</charseq> <entity ID="adige20041007_id413720-E7" TYPE="PER" SUBTYPE="Group" CLASS="SPC"> <entity_mention ID="adige20041007_id413720-E7-2" TYPE="NOM" <charseq START="456" END="504">agricoltori e allevatori fino all'epoca dei Celti</charseq>

<charseq START="456" END="479">agricoltori e allevatori</charseq> <entity ID="adige20041007_id413720-E8" TYPE="PER" SUBTYPE="Group" CLASS="SPC"> <entity_mention ID="adige20041007_id413720-E8-2" TYPE="NOM" <charseq START="426" END="504">dalle più antiche comunità di agricoltori e allevatori fino all'epoca dei Celti</charseq> <charseq START="444" END="451">comunità</charseq> <entity ID="adige20041007_id413720-E9" TYPE="GPE" SUBTYPE="Population-Center" CLASS="SPC"> <entity_mention ID="adige20041007_id413720-E9-1" TYPE="NAM" PRIMARY="true" ROLE="LOC" METONYMY_MENTION="FALSE"> <charseq START="51" END="56">TRENTO</charseq> <charseq START="51" END="56">TRENTO</charseq> <entity_mention ID="adige20041007_id413720-E9-2" TYPE="NAM" PRIMARY="false" ROLE="GPE" METONYMY_MENTION="FALSE"> <charseq START="197" END="202">Trento</charseq> <charseq START="197" END="202">Trento</charseq> <entity_attributes> <name NAME="TRENTO"> <charseq START="51" END="56">TRENTO</charseq> <name NAME="Trento"> <charseq START="197" END="202">Trento</charseq> </entity_attributes> <entity ID="adige20041007_id413720-E10" TYPE="GPE" SUBTYPE="Population-Center" CLASS="SPC"> <entity_mention ID="adige20041007_id413720-E10-1" TYPE="NAM" PRIMARY="true" ROLE="GPE" METONYMY_MENTION="FALSE"> <charseq START="323" END="328">Milano</charseq> <charseq START="323" END="328">Milano</charseq>

<entity_attributes> <name NAME="Milano"> <charseq START="323" END="328">Milano</charseq> </entity_attributes> <entity ID="adige20041007_id413720-E11" TYPE="PER" SUBTYPE="Group" CLASS="SPC"> <entity_mention ID="adige20041007_id413720-E11-1" TYPE="NOM" <charseq START="496" END="504">dei Celti</charseq> <charseq START="500" END="504">Celti</charseq> </document> </source_file>

References Magnini, B., Pianta, E., Speranza, M., Bartalesi Lenzi, V., and Sprugnoli, R. Local Entity Detection and Recognition Annotation for Evalita 2009. On-line: http://evalita.fbk.eu/ doc/annotation_report_ledr.pdf Automatic Content Extraction 2008 Evaluation Plan (ACE08) Assessment of Detection and Recognition of Entities and Relations Within and Across Documents, April 2008 On-line: http://www.nist.gov/speech/tests/ace/2008/doc/ace08-evalplan.v1.2d.pdf Web Sites ACE Evaluation, http://www.nist.gov/speech/tests/ace/ ACE annotation, http://www.ldc.upenn.edu/projects/ace/ Callisto, http://callisto.mitre.org I-CAB, http://tcc.fbk.eu/projects/ontotext/icab.html L Adige, http://www.ladige.it/