Building and exploiting a dependency treebank for French radio broadcasts

Size: px
Start display at page:

Download "Building and exploiting a dependency treebank for French radio broadcasts"

Transcription

1 Building and exploiting a dependency treebank for French radio broadcasts Christophe Cerisara, Claire Gardent and Corinna Anderson LORIA, Nancy

2 Goals Corpus Annotation Tools and Methodology Annotation schema The Impact of Speech Constructs on Parsing Conclusions

3 Goals Long term Use syntax to improve speech recognition (INRIA Collaborative Action Rapsodis ) Medium term Build a tree bank of spoken data (transcription of radio broadcast news) Empirical study of speech constructs Analyse impact of speech constructs on parsing Parse speech

4 The Ester Corpus 37 hours of manual transcriptions of French radios ( ) Annotations with speakers, words, noise symbols, sometimes punctuation Normalisation to match the output of speech recognition systems: Remove punctuation, first word upper-case letters Remove incomplete words: Le pe- petit... But keep disfluencies with complete words: Le le petit

5 Example transcriptions quiberon Frédéric Colas France Bleue Armorique pour France-Inter Header; No punctuation bonsoir Non verbal utterance l enquête sur l office HLM de Paris Jean Tiberi le maire de la capitale annonce lui-même dans une interview au Monde Incomplete utterances sa mise en examen pour complicité de trafic d influence Incorrect sentence segmentation je pense que cela doit conduire euh Jean Tiberi le premier euh à une réflexion Hesitations

6 Methodology for constructing ETB Manual annotation supported by pre-parsing Active Learning for selectively extending the annotated data and improve the parser using a small training corpus (Christophe s talk)

7 Manual Annotation On and Off since 2009 Uses JSafran framework Iterative process: 1. Design of an annotation scheme. 2. Manual annotation of 5000 words 3. Training of a Malt Parser model 4. Automatic parsing of a new corpus segment 5. Manual correction of this corpus segment 6. Addition of this corrected segment to the training corpus 7. Iterate from step 3

8 The ATB Annotation Schema 15 dependency relations: SUJ (subject) OBJ (object) POBJ (prepositional object) ATTS (subject attribute) ATTO (object attribute) MOD (modifier) COMP (complementizer) AUX (auxiliary) DET (determiner) CC (coordination) REF (reflexive pronoun) JUXT (juxtaposition) APPOS (apposition) DUMMY (syntactically governed but semantically empty dependent) e.g. expletive subject DISFL (disfluency).

9 ETB and PTB annotations ETB Label description P7Dep MOD modifier mod, mod rel, dep COMP complementizer obj DET determiner det SUJ subject suj OBJ object obj DISFL disfluency mod CC coordination coord, dep coord POBJ prepositional object a obj, de obj, p obj ATTS subject attribute ats JUXT juxtaposition mod MultiMots multi-word expression mod AUX auxiliary aux tps, aux pass, aux caus DUMMY empty dependent aff REF reflexive pronoun obj, a obj, de obj APPOS apposition mod ATTO object attribute ato

10 Rule Converter ETB PTB ETB MOD CC POBJ AUX REF P7Dep mod, mod rel, dep coord, dep coord a obj, de obj, p obj aux tps, aux pass, aux caus obj, a obj, de obj ETB DISFL JUXT MultiMots APPOS P7Dep mod mod mod mod Converter accuracy on an ESTER test corpus manually annotated with the P7Dep format LAS (labelled attachment score) = 92.6% UAS = 98.5%.

11 Example Annotations Figure: Screenshot of the J-Safran GUI for dependency tree edition

12 J-Safran software GUI with the following functionalities Vizualisation and Edition of dependency graphs POS-tagging: Tree-Tagger (French version) and OpenNLP Tagger (CRF trained on French TreeBank) Parsing with the Malt Parser (ETB or FTB models) Training of parsing models on annotated data Search functions (words, dependencies, sequences,...) Evaluation with CoNLL scripts

13 Utterance-level annotations Part 2 of the ETB corpus was annotated with utterance level annotations. GUEST: et euh je je pense que pourri beaucoup l image de de la conduite (and hum I I think deteriorates much the image of of driving) SPEAKER: les deux gouvernements cherchent un compromis (both governments look for some compromise) ELLIPSIS: je cite de mémoire qu un tiers des morts à l avant euh n avaient pas leur ceinture et euh non un quart à l avant et je crois près du tiers à l arrière (... a third of the deads in front did not have their safety belt on huh no a quarter in front and I think a third at the back) HEADER: quiberon frédéric colas france bleue armorique pour france-inter (Quiberon Frédéric Colas france bleue Armorique for france-inter)

14 Models performance on ETB Part 2 Training corpus: 8544 words Test corpus: 1747 words Labelled attachment score i.e., percentage of tokens with correct governor and dependency relation (LAS): 63.6% Which constructs most affect parsing accuracy? We look at Speaker/Guest differences, disfluencies and radio headlines

15 Impact of disfluencies Ratio of utterances with disfluencies: 41% (D sub-corpus) Manual removal of disfluencies in the test corpus Performances on the D sub-corpus: W/o With (w,w/o) disfl disfl LAS 70.2% 66.1% +4.1 UAS 77.2% 73.5% +3.7 LAC 76.5% 72.7% +3.8 Performances on the whole test corpus: W/o With (w,w/o) disfl disfl LAS 67.3% 65.7% +1.6 UAS 74.2% 73.0% +1.2 LAC 74.2% 72.6% +1.6

16 Impact of speaking style Ratio of journalistic/guest utterances: 72%/28% Performances on both types of speech: Journalist Guest (J,G) LAS 70.8% 65.2% -5.6 UAS 76.5% 71.8% -4.7 LAC 77.5% 72.0% -5.5 Is this difference due to disfluencies? remove disfluencies: Journalist Guest (J,G) LAS 71.2% 67.8% -3.4 UAS 77.2% 74.1% -3.1 LAC 78.2% 74.5% -3.7 Disfluencies explain 40% of the degradation observed between journalist and guest speaker parsing.

17 Impact of headers Ratio of header utterances: 14% Guest utterances removed 10-fold cross-validation Comparative results on headers / journalist style: Journalist without headers Headers (-H,+H) LAS 70.6% 61.7% -8.9 UAS 76.2% 69.7% -6.5 LAC 77.4% 67.5% -9.9

18 Summarising Disfluencies degrades parsing performance in average by 1.6 points Guest utterances are harder to parse (even after disfluencies are removed) with a LAS decrease of 3.4 points Radio specific constructs (headlines) show a LAS decrease of 8.9 points (different syntactic structure, sparse data)

19 Conclusions and future work Current status Current ETB: words (53000 Ester 2, Etape) LAS with MATE Parser: 76% Future work Continue annotations Automatically detect incorrect annotations Finer grained annotation of disfluencies (hesitation,repairs,repetitions,false start) Investigate Active Learning Investigate different parsing strategies (preparse disfluencies and named entities, joint model for named entity recognition and parsing)

Syntactic annotation of spontaneous speech: application to call center conversation data

Syntactic annotation of spontaneous speech: application to call center conversation data Syntactic annotation of spontaneous speech: application to call center conversation data Frédéric Béchet, Thierry Bazillon, Benoit Favre, Alexis Nasr Aix Marseille Université LIF-CNRS Laboratoire d Informatique

More information

Evaluation of speech technologies

Evaluation of speech technologies CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline

More information

TS3: an Improved Version of the Bilingual Concordancer TransSearch

TS3: an Improved Version of the Bilingual Concordancer TransSearch TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Level 3 French, 2015

Level 3 French, 2015 91543 915430 3SUPERVISOR S Level 3 French, 2015 91543 Demonstrate understanding of a variety of extended spoken French texts 9.30 a.m. Wednesday 18 November 2015 Credits: Five Achievement Achievement with

More information

AP FRENCH LANGUAGE AND CULTURE 2013 SCORING GUIDELINES

AP FRENCH LANGUAGE AND CULTURE 2013 SCORING GUIDELINES AP FRENCH LANGUAGE AND CULTURE 2013 SCORING GUIDELINES Interpersonal Writing: E-mail Reply 5: STRONG performance in Interpersonal Writing Maintains the exchange with a response that is clearly appropriate

More information

Test Suite Generation

Test Suite Generation Test uite Generation ylvain chmitz LORIA, INRIA Nancy - Grand Est, Nancy, France NaTAL Workshop, Nancy, June 25, 2008 Issues with urface Generation *Jean que cherches-tu est grand. Jean qui baille s endort.

More information

Elizabethtown Area School District French II

Elizabethtown Area School District French II Elizabethtown Area School District French II Course Number: 605 Length of Course: 18 weeks Grade Level: 9-12 Total Clock Hours: 120 Length of Period: 80 minutes Date Written: Spring 2009 Periods per Week/Cycle:

More information

AP FRENCH LANGUAGE 2008 SCORING GUIDELINES

AP FRENCH LANGUAGE 2008 SCORING GUIDELINES AP FRENCH LANGUAGE 2008 SCORING GUIDELINES Part A (Essay): Question 31 9 Demonstrates STRONG CONTROL Excellence Ease of expression marked by a good sense of idiomatic French. Clarity of organization. Accuracy

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

Annotation Guidelines for Dutch-English Word Alignment

Annotation Guidelines for Dutch-English Word Alignment Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College

More information

AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES Identical to Scoring Guidelines used for German, Italian, and Spanish Language and Culture Exams Presentational Writing: Persuasive Essay 5:

More information

Trameur: A Framework for Annotated Text Corpora Exploration

Trameur: A Framework for Annotated Text Corpora Exploration Trameur: A Framework for Annotated Text Corpora Exploration Serge Fleury (Sorbonne Nouvelle Paris 3) serge.fleury@univ-paris3.fr Maria Zimina(Paris Diderot Sorbonne Paris Cité) maria.zimina@eila.univ-paris-diderot.fr

More information

GSAC CONSIGNE DE NAVIGABILITE définie par la DIRECTION GENERALE DE L AVIATION CIVILE Les examens ou modifications décrits ci-dessous sont impératifs. La non application des exigences contenues dans cette

More information

Course Title: French II Topic/Concept: ir and re verbs Time Allotment: 2 weeks Unit Sequence: 1 Major Concepts to be learned:

Course Title: French II Topic/Concept: ir and re verbs Time Allotment: 2 weeks Unit Sequence: 1 Major Concepts to be learned: Course Title: French II Topic/Concept: ir and re verbs Time Allotment: 2 weeks Unit Sequence: 1 1. To be able to conjugate regualr ir verbs 2. To be able to conjugate regular re verbs 3. Common outdoor

More information

CURRICULUM VITAE Studies Positions Distinctions Research interests Research projects

CURRICULUM VITAE Studies Positions Distinctions Research interests Research projects 1 CURRICULUM VITAE ABEILLÉ Anne Address : LLF, UFRL, Case 7003, Université Paris 7, 2 place Jussieu, 75005 Paris Tél. 33 1 57 27 57 67 Fax 33 1 57 27 57 88 abeille@linguist.jussieu.fr http://www.llf.cnrs.fr/fr/abeille/

More information

Isabelle Debourges, Sylvie Guilloré-Billot, Christel Vrain

Isabelle Debourges, Sylvie Guilloré-Billot, Christel Vrain /HDUQLQJ9HUEDO5HODWLRQVLQ7H[W0DSV Isabelle Debourges, Sylvie Guilloré-Billot, Christel Vrain LIFO Rue Léonard de Vinci 45067 Orléans cedex 2 France email: {debourge, billot, christel.vrain}@lifo.univ-orleans.fr

More information

LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH

LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH Gertjan van Noord Deliverable 3-4: Report Annotation of Lassy Small 1 1 Background Lassy Small is the Lassy corpus in which the syntactic annotations

More information

Transition-Based Dependency Parsing with Long Distance Collocations

Transition-Based Dependency Parsing with Long Distance Collocations Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,

More information

AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES Identical to Scoring Guidelines used for German, Italian, and Spanish Language and Culture Exams Interpersonal Writing: E-mail Reply 5: STRONG

More information

Online Tutoring System For Essay Writing

Online Tutoring System For Essay Writing Online Tutoring System For Essay Writing 2 Online Tutoring System for Essay Writing Unit 4 Infinitive Phrases Review Units 1 and 2 introduced some of the building blocks of sentences, including noun phrases

More information

Elizabethtown Area School District French III

Elizabethtown Area School District French III Elizabethtown Area School District French III Course Number: 610 Length of Course: 18 weeks Grade Level: 10-12 Elective Total Clock Hours: 120 Length of Period: 80 minutes Date Written: Spring 2009 Periods

More information

GCSE FRENCH 8658/LF. Foundation Tier Paper 1 Listening

GCSE FRENCH 8658/LF. Foundation Tier Paper 1 Listening SPEIMEN MTERIL GSE FRENH Foundation Tier Paper 1 Listening F Specimen 2018 Morning Time allowed: 35 minutes (including 5 minutes reading time before the test) You will need no other materials. The pauses

More information

EVALITA 2011. http://www.evalita.it/2011. Named Entity Recognition on Transcribed Broadcast News Guidelines for Participants

EVALITA 2011. http://www.evalita.it/2011. Named Entity Recognition on Transcribed Broadcast News Guidelines for Participants EVALITA 2011 http://www.evalita.it/2011 Named Entity Recognition on Transcribed Broadcast News Guidelines for Participants Valentina Bartalesi Lenzi Manuela Speranza Rachele Sprugnoli CELCT, Trento FBK,

More information

DECODA: a call-center human-human spoken conversation corpus

DECODA: a call-center human-human spoken conversation corpus DECODA: a call-center human-human spoken conversation corpus F. Bechet 1, B. Maza 2, N. Bigouroux 3, T. Bazillon 1, M. El-Bèze 2, R. De Mori 2, E. Arbillot 4 1 Aix Marseille Univ, LIF-CNRS, Marseille,

More information

2. Il faut + infinitive and its more nuanced alternative il faut que + subjunctive.

2. Il faut + infinitive and its more nuanced alternative il faut que + subjunctive. Teaching notes This resource is designed to enable students to broaden their range of expression on the issue of homelessness and poverty, specifically in terms of suggesting possible solutions. The aim

More information

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS Mbarek Charhad, Daniel Moraru, Stéphane Ayache and Georges Quénot CLIPS-IMAG BP 53, 38041 Grenoble cedex 9, France Georges.Quenot@imag.fr ABSTRACT The

More information

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks

More information

SUPPLEMENT N 4 DATED 12 May 2014 TO THE BASE PROSPECTUS DATED 22 NOVEMBER 2013. BPCE Euro 40,000,000,000 Euro Medium Term Note Programme

SUPPLEMENT N 4 DATED 12 May 2014 TO THE BASE PROSPECTUS DATED 22 NOVEMBER 2013. BPCE Euro 40,000,000,000 Euro Medium Term Note Programme SUPPLEMENT N 4 DATED 12 May 2014 TO THE BASE PROSPECTUS DATED 22 NOVEMBER 2013 BPCE Euro 40,000,000,000 Euro Medium Term Note Programme BPCE (the Issuer ) may, subject to compliance with all relevant laws,

More information

Natural Language Processing

Natural Language Processing Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models

More information

Factoring Surface Syntactic Structures

Factoring Surface Syntactic Structures MTT 2003, Paris, 16 18 jui003 Factoring Surface Syntactic Structures Alexis Nasr LATTICE-CNRS (UMR 8094) Université Paris 7 alexis.nasr@linguist.jussieu.fr Mots-clefs Keywords Syntaxe de surface, représentation

More information

Archived Content. Contenu archivé

Archived Content. Contenu archivé ARCHIVED - Archiving Content ARCHIVÉE - Contenu archivé Archived Content Contenu archivé Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject

More information

Robustness of a Spoken Dialogue Interface for a Personal Assistant

Robustness of a Spoken Dialogue Interface for a Personal Assistant Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia

More information

Plugin SMILK. données liées et traitement de la langue pour plus d'intelligence dans la navigation sur le Web

Plugin SMILK. données liées et traitement de la langue pour plus d'intelligence dans la navigation sur le Web Plugin SMILK données liées et traitement de la langue pour plus d'intelligence dans la navigation sur le Web Elena Cabrio, Jordan Calvi, Fabien Gandon, Cédric Lopez, Farhad Nooralahzadeh, Thibault Parmentier,

More information

Trameur: A Framework for Annotated Text Corpora Exploration

Trameur: A Framework for Annotated Text Corpora Exploration Trameur: A Framework for Annotated Text Corpora Exploration Serge Fleury Sorbonne Nouvelle Paris 3 SYLED-CLA2T, EA2290 75005 Paris, France serge.fleury@univ-paris3.fr Maria Zimina Paris Diderot Sorbonne

More information

DEPENDENCY PARSING JOAKIM NIVRE

DEPENDENCY PARSING JOAKIM NIVRE DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes

More information

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like

More information

Applying Repair Processing in Chinese Homophone Disambiguation

Applying Repair Processing in Chinese Homophone Disambiguation Applying Repair Processing in Chinese Homophone Disambiguation Yue-Shi Lee and Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan, R.O.C.

More information

In-Home Caregivers Teleconference with Canadian Bar Association September 17, 2015

In-Home Caregivers Teleconference with Canadian Bar Association September 17, 2015 In-Home Caregivers Teleconference with Canadian Bar Association September 17, 2015 QUESTIONS FOR ESDC Temporary Foreign Worker Program -- Mr. Steve WEST *Answers have been updated following the conference

More information

Assessments; Optional module 1 vert assessments. Own test on the perfect tense

Assessments; Optional module 1 vert assessments. Own test on the perfect tense French Scheme of Work (MFL1/Third Year) Studio 3 Module 1 vert (By half-term of the Autumn Term) Learning Objectives/Module 1 Ma vie sociale d ado Concentrate on verbs and tenses. Opportunity to revise

More information

Speech Transcription

Speech Transcription TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion

More information

Considerations for developing VoiceXML in Canadian French

Considerations for developing VoiceXML in Canadian French Considerations for developing VoiceXML in Canadian French This section contains information that is specific to Canadian French. If you are developing Canadian French voice applications, use the information

More information

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words Standard 3: Writing Process 3.1: Prewrite 58-69% 10.LA.3.1.2 Generate a main idea or thesis appropriate to a type of writing. (753.02.b) Items may include a specified purpose, audience, and writing outline.

More information

Marie Dupuch, Frédérique Segond, André Bittar, Luca Dini, Lina Soualmia, Stefan Darmoni, Quentin Gicquel, Marie-Hélène Metzger

Marie Dupuch, Frédérique Segond, André Bittar, Luca Dini, Lina Soualmia, Stefan Darmoni, Quentin Gicquel, Marie-Hélène Metzger Separate the grain from the chaff: designing a system to make the best use of language and knowledge technologies to model textual medical data extracted from electronic health records Marie Dupuch, Frédérique

More information

GCSE French. Other Guidance. Exemplar Material: Controlled Assessment Writing Autumn 2010

GCSE French. Other Guidance. Exemplar Material: Controlled Assessment Writing Autumn 2010 GCSE French Other Guidance Exemplar Material: Controlled Assessment Writing Autumn 2010 Teacher Resource Bank / GCSE French / Exemplar Material Controlled Assessment Writing / Version 1.2 IMPORTANT INFORMATION

More information

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23 POS Def. Part of Speech POS POS L645 POS = Assigning word class information to words Dept. of Linguistics, Indiana University Fall 2009 ex: the man bought a book determiner noun verb determiner noun 1

More information

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Ana-Maria Popescu Alex Armanasu Oren Etzioni University of Washington David Ko {amp, alexarm, etzioni,

More information

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging

More information

Training and evaluation of POS taggers on the French MULTITAG corpus

Training and evaluation of POS taggers on the French MULTITAG corpus Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction

More information

HOW MUCH DO YOU KNOW ABOUT RUGBY???

HOW MUCH DO YOU KNOW ABOUT RUGBY??? HOW MUCH DO YOU KNOW ABOUT RUGBY??? TÂCHE: Je peux donner quelques informations à propos du rugby. EO/PE (A2/B1) Rugby History? How is it played? The different rugby competitions? The Rugby World Cup?

More information

Open issues regarding legal metadata: IP licensing and management of different cognitive levels

Open issues regarding legal metadata: IP licensing and management of different cognitive levels Open issues regarding legal metadata: IP licensing and management of different cognitive levels FLORENCE MAY 6th, 2011 Danièle Bourcier Meritxell Fernández-Barrera 1 Cersa CNRS-Université Paris 2, Paris

More information

June 2016 Language and cultural workshops In-between session workshops à la carte June 13-25 2 weeks All levels

June 2016 Language and cultural workshops In-between session workshops à la carte June 13-25 2 weeks All levels June 2016 Language and cultural workshops In-between session workshops à la carte June 13-25 2 weeks All levels We have designed especially for you a new set of language and cultural workshops to focus

More information

Raconte-moi : Les deux petites souris

Raconte-moi : Les deux petites souris Raconte-moi : Les deux petites souris 1. Content of the story: Two little mice called Sophie and Lulu live in a big house in Paris. Every day, an animal knock at the door of their big house.if they like

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

DHI a.s. Na Vrsich 51490/5, 100 00, Prague 10, Czech Republic ( t.metelka@dhi.cz, z.svitak@dhi.cz )

DHI a.s. Na Vrsich 51490/5, 100 00, Prague 10, Czech Republic ( t.metelka@dhi.cz, z.svitak@dhi.cz ) NOVATECH Rehabilitation strategies in wastewater networks as combination of operational, property and model information Stratégies de réhabilitation des réseaux d'égouts combinant des données d exploitation,

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Assessment software development for distributed firewalls

Assessment software development for distributed firewalls Assessment software development for distributed firewalls Damien Leroy Université Catholique de Louvain Faculté des Sciences Appliquées Département d Ingénierie Informatique Année académique 2005-2006

More information

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect

More information

WIRING DIAGRAM EXAMPLE EXEMPLE DE SCHEMA DE CABLAGE

WIRING DIAGRAM EXAMPLE EXEMPLE DE SCHEMA DE CABLAGE Revision Modification Date Auteur Controle APPR. WIRING DIAGRAM EXAMPLE EXEMPLE DE SCHEMA DE CABLAGE Website: www.cretechnology.com Email: info@cretechnology.com Technical support: + (0) Email: support@cretechnology.com

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

General Certificate of Education Advanced Level Examination June 2012

General Certificate of Education Advanced Level Examination June 2012 General Certificate of Education Advanced Level Examination June 2012 French Unit 4 Speaking Test Candidate s Material To be conducted by the teacher examiner between 7 March and 15 May 2012 (FRE4T) To

More information

11520 Alberta CALGARY 6 6. 11161 Nova Scotia / Nouvelle-Écosse HALIFAX 5 5. 13123 Quebec / Québec MONTREAL 26 23. 15736 Ontario OTTAWA 162 160

11520 Alberta CALGARY 6 6. 11161 Nova Scotia / Nouvelle-Écosse HALIFAX 5 5. 13123 Quebec / Québec MONTREAL 26 23. 15736 Ontario OTTAWA 162 160 Table S1 - Service to the Public by Bilingual Office / Point of Service as of March 31st of year Tableau S1 - Service au public par bureau bilingue /point de service en date du 31 mars de l'année Office

More information

Outline of today s lecture

Outline of today s lecture Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Third Supplement dated 8 September 2015 to the Euro Medium Term Note Programme Base Prospectus dated 12 December 2014

Third Supplement dated 8 September 2015 to the Euro Medium Term Note Programme Base Prospectus dated 12 December 2014 Third Supplement dated 8 September 2015 to the Euro Medium Term Note Programme Base Prospectus dated 12 December 2014 HSBC France 20,000,000,000 Euro Medium Term Note Programme This third supplement (the

More information

VIREMENTS BANCAIRES INTERNATIONAUX

VIREMENTS BANCAIRES INTERNATIONAUX Les clients de Markets.com peuvent financer leur compte en effectuant des virements bancaires depuis de nombreuses banques dans le monde. Consultez la liste ci-dessous pour des détails sur les virements

More information

How To Write A Police Budget

How To Write A Police Budget ARCHIVED - Archiving Content ARCHIVÉE - Contenu archivé Archived Content Contenu archivé Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject

More information

TREATIES AND OTHER INTERNATIONAL ACTS SERIES 12859. Agreement Between the UNITED STATES OF AMERICA and CONGO

TREATIES AND OTHER INTERNATIONAL ACTS SERIES 12859. Agreement Between the UNITED STATES OF AMERICA and CONGO 1 TREATIES AND OTHER INTERNATIONAL ACTS SERIES 12859 EMPLOYMENT Agreement Between the UNITED STATES OF AMERICA and CONGO Effected by Exchange of Notes Dated at Washington April 11 and May 23, 1997 2 NOTE

More information

LEÇON 17 Le français pratique: L achat des vêtements

LEÇON 17 Le français pratique: L achat des vêtements Nom Unité 6. Le shopping LEÇON 17 Le français pratique: L achat des vêtements Projet 1 Articles promotionnels Unité 6 Leçon 17 Work in a group of four. Imagine that you are the editors and art designers

More information

Amazigh ConCorde: an appropriate concordance for Amazigh

Amazigh ConCorde: an appropriate concordance for Amazigh SITACAM 09, Agadir, 12-13 December 2009 Amazigh ConCorde: an appropriate concordance for Amazigh Siham Boulaknadel Institut Royal de la Culture Amazighe Avenue Allal El Fassi, Madinat Al Irfane, Rabat

More information

Online free translation services

Online free translation services [Translating and the Computer 24: proceedings of the International Conference 21-22 November 2002, London (Aslib, 2002)] Online free translation services Thei Zervaki tzervaki@hotmail.com Introduction

More information

Kindly go through this entire document (5 pages) carefully before booking your flight.

Kindly go through this entire document (5 pages) carefully before booking your flight. Dear Student, Greetings from Vatel International Business School, specialising in Hotel & Tourism Management! We are pleased to inform you that for the upcoming intake, Monday 01 st of March 2010, the

More information

What about me and you? We can also be objects, and here it gets really easy,

What about me and you? We can also be objects, and here it gets really easy, YOU MEAN I HAVE TO KNOW THIS!? VOL 3 PRONOUNS Object pronouns Object pronouns If subjects do the verb, guess what the objects do? They get the verb done to them! Consider the following sentences: We eat

More information

A chart generator for the Dutch Alpino grammar

A chart generator for the Dutch Alpino grammar June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:

More information

RAPPORT FINANCIER ANNUEL PORTANT SUR LES COMPTES 2014

RAPPORT FINANCIER ANNUEL PORTANT SUR LES COMPTES 2014 RAPPORT FINANCIER ANNUEL PORTANT SUR LES COMPTES 2014 En application de la loi du Luxembourg du 11 janvier 2008 relative aux obligations de transparence sur les émetteurs de valeurs mobilières. CREDIT

More information

The Addition to residences in Scotland, Canada

The Addition to residences in Scotland, Canada 1 Report to/rapport au : Ottawa Built Heritage Advisory Committee Comité consultatif sur le patrimoine bâti d Ottawa and/et Planning Committee Comité de l'urbanisme and Council / et au Conseil October

More information

2013 - Temporary Supplement Stamps, Templates & Chipboards. Janvier 2013 - Supplément temporaire Étampes, pochoirs & chipboards

2013 - Temporary Supplement Stamps, Templates & Chipboards. Janvier 2013 - Supplément temporaire Étampes, pochoirs & chipboards January 2013 - Temporary Supplement Stamps, Templates & Chipboards Janvier 2013 - Supplément temporaire Étampes, pochoirs & chipboards Through the years many artists have contributed to the success of

More information

BonPatronPro to the rescue

BonPatronPro to the rescue OMLTA H2 General Gr. 7-12 Room: Aurora BonPatronPro to the rescue Online help for those perpetual written errors for students AND teachers Maria Gauthier mariagauthier@ucc.on.ca Saturday March 31, 2012

More information

AgroMarketDay. Research Application Summary pp: 371-375. Abstract

AgroMarketDay. Research Application Summary pp: 371-375. Abstract Fourth RUFORUM Biennial Regional Conference 21-25 July 2014, Maputo, Mozambique 371 Research Application Summary pp: 371-375 AgroMarketDay Katusiime, L. 1 & Omiat, I. 1 1 Kampala, Uganda Corresponding

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

FRENCH AS A SECOND LANGUAGE TRAINING

FRENCH AS A SECOND LANGUAGE TRAINING FRENCH AS A SECOND LANGUAGE TRAINING Beginner 1 This course is intended for people who have never studied French or people who have taken French in the past but have either forgotten most of it or have

More information

ROME INTERNATIONAL MIDDLE/HIGH SCHOOL SYNOPSIS 2015-16

ROME INTERNATIONAL MIDDLE/HIGH SCHOOL SYNOPSIS 2015-16 ROME INTERNATIONAL MIDDLE/HIGH SCHOOL SYNOPSIS 2015-16 TEACHER: SUBJECT: French CLASS: Grade 9 UNIT TITLE DURATION UNIT SUMMARY ASSESSMENT Our world September Pupils develop speaking and writing at greater

More information

FOR TEACHERS ONLY The University of the State of New York

FOR TEACHERS ONLY The University of the State of New York FOR TEACHERS ONLY The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION F COMPREHENSIVE EXAMINATION IN FRENCH Wednesday, June 22, 2011 9:15 a.m. to 12:15 p.m., only SCORING KEY Updated

More information

Finding Syntactic Characteristics of Surinamese Dutch

Finding Syntactic Characteristics of Surinamese Dutch Finding Syntactic Characteristics of Surinamese Dutch Erik Tjong Kim Sang Meertens Institute erikt(at)xs4all.nl June 13, 2014 1 Introduction Surinamese Dutch is a variant of Dutch spoken in Suriname, a

More information

Module 6: Le Shopping. 06.01: Les Vêtements

Module 6: Le Shopping. 06.01: Les Vêtements Module 6: Le Shopping 06.00 : Le Métro Please spend time in the lesson reading about Le Métro and its role in the lives of many Parisians. Write down a few notes about what you learned. 06.01: Les Vêtements

More information

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty 1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment

More information

PiQASso: Pisa Question Answering System

PiQASso: Pisa Question Answering System PiQASso: Pisa Question Answering System Giuseppe Attardi, Antonio Cisternino, Francesco Formica, Maria Simi, Alessandro Tommasi Dipartimento di Informatica, Università di Pisa, Italy {attardi, cisterni,

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

I will explain to you in English why everything from now on will be in French

I will explain to you in English why everything from now on will be in French I will explain to you in English why everything from now on will be in French Démarche et Outils REACHING OUT TO YOU I will explain to you in English why everything from now on will be in French All French

More information

Automatic Detection and Correction of Errors in Dependency Treebanks

Automatic Detection and Correction of Errors in Dependency Treebanks Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany alexander.volokh@dfki.de Günter Neumann DFKI Stuhlsatzenhausweg

More information

SEPA Mandate Guide. Contents. 1.0 The purpose of this document 2. 2.0 Why mandates are required 2. 2.1 When a new mandate is required 2

SEPA Mandate Guide. Contents. 1.0 The purpose of this document 2. 2.0 Why mandates are required 2. 2.1 When a new mandate is required 2 SEPA Mandate Guide Contents 1.0 The purpose of this document 2 2.0 Why mandates are required 2 2.1 When a new mandate is required 2 2.2 Cancellation of a mandate 2 2.3 When to amend a mandate 2 3.0 Mandate

More information

Industry Guidelines on Captioning Television Programs 1 Introduction

Industry Guidelines on Captioning Television Programs 1 Introduction Industry Guidelines on Captioning Television Programs 1 Introduction These guidelines address the quality of closed captions on television programs by setting a benchmark for best practice. The guideline

More information

Post-Secondary Opportunities For Student-Athletes / Opportunités post-secondaire pour les étudiantathlètes

Post-Secondary Opportunities For Student-Athletes / Opportunités post-secondaire pour les étudiantathlètes Post-Secondary Opportunities For Student-Athletes / Opportunités post-secondaire pour les étudiantathlètes Jean-François Roy Athletics Canada / Athlétisme Canada Talent Development Coordinator / Coordonnateur

More information

CALICO Journal, Volume 9 Number 1 9

CALICO Journal, Volume 9 Number 1 9 PARSING, ERROR DIAGNOSTICS AND INSTRUCTION IN A FRENCH TUTOR GILLES LABRIE AND L.P.S. SINGH Abstract: This paper describes the strategy used in Miniprof, a program designed to provide "intelligent' instruction

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,

More information

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Ling 201 Syntax 1. Jirka Hana April 10, 2006 Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all

More information

Survey on Conference Services provided by the United Nations Office at Geneva

Survey on Conference Services provided by the United Nations Office at Geneva Survey on Conference Services provided by the United Nations Office at Geneva Trade and Development Board, fifty-eighth session Geneva, 12-23 September 2011 Contents Survey contents Evaluation criteria

More information