Kevin Tang and Andrew Nevins

Similar documents
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Word Completion and Prediction in Hebrew

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Granite Oaks Middle School

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Unit 1, September TB Preliminary Lesson Unit 2, October TB Unit 5 Lesson 1 What do you and your family like to eat?

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

Using the BNC to create and develop educational materials and a website for learners of English

Computer-aided Document Indexing System

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

PROFICIENCY TARGET FOR END OF INSTRUCTION, SPANISH I

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

Computer Aided Document Indexing System

The Fibonacci Strategy Revisited: Can You Really Make Money by Betting on Soccer Draws?

Veronika VINCZE, PhD. PERSONAL DATA Date of birth: 1 July 1981 Nationality: Hungarian

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

PROMT Technologies for Translation and Big Data

Identifying Focus, Techniques and Domain of Scientific Papers

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Rule-Based Short Query Intent Identification System

Author Gender Identification of English Novels

Common Curriculum Map. Discipline: Foreign Language Course: Spanish 1-2

Customizing an English-Korean Machine Translation System for Patent Translation *

Term extraction for user profiling: evaluation by the user

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

Foreign Language (FL)

UNIVERSITY OF JORDAN ADMISSION AND REGISTRATION UNIT COURSE DESCRIPTION

Discovering suffixes: A Case Study for Marathi Language

ASTD: Arabic Sentiment Tweets Dataset

INPOLDER under Word Level

Department of Modern Languages

THE UNIVERSITY OF MANCHESTER PARTICULARS OF APPOINTMENT FACULTY OF MEDICAL AND HUMAN SCIENCES SCHOOL OF PSYCHOLOGICAL SCIENCES

(Big) Data Analytics: From Word Counts to Population Opinions

Available fields of study for: University of Milan School of Language Mediation and Intercultural Communication Italy

Statistical Machine Translation

Expert System. Deep Semantic vs. Keyword and Shallow Linguistic: A New Approach for Supporting Exploitation

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Database Design For Corpus Storage: The ET10-63 Data Model

Natural Language to Relational Query by Using Parsing Compiler

PoS-tagging Italian texts with CORISTagger

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z. Letter

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Trameur: A Framework for Annotated Text Corpora Exploration

N-gram Language Models and POS Distribution for the Identification of Spanish Varieties

SPANISH Kindergarten

Finding Advertising Keywords on Web Pages. Contextual Ads 101

UNIVERSITÀ DEGLI STUDI DELL AQUILA CENTRO LINGUISTICO DI ATENEO

Major Exit Questionnaire. Congratulations on completing a major in the Department of Spanish and Portuguese!

Luis Bonilla, Ph.D. Curriculum Vitae. 124 Sunnyside Park Rd. Syracuse, NY

An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

University of California, San Diego : Linguistics Language Program : Spring

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Developing a User-based Method of Web Register Classification

Automatic Identification of Arabic Language Varieties and Dialects in Social Media

Español Elemental. Repaso por el examen parcial Capítulos 3B, 4A, 4B, 5A. Fechas del Examen- Speaking- Essay and Short Answer- Listening and reading-

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX

Effective Self-Training for Parsing

AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom

Bachelor s Degree in English Studies

6 th Grade Spanish Curriculum

EAST PENNSBORO AREA COURSE: LFS 430 SCHOOL DISTRICT

MA in English language teaching Pázmány Péter Catholic University *** List of courses and course descriptions ***

THE IMPORTANCE OF WORD PROCESSING IN THE USER ENVIRONMENT. Dr. Peter A. Walker DG V : Commission of the European Communities

Finnish Language Proficiency of Immigrant Physicians in Medical Licensure Examinations

Translation Solution for

Historical Linguistics. Diachronic Analysis. Two Approaches to the Study of Language. Kinds of Language Change. What is Historical Linguistics?

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Spanish Curriculum Grades 4-8

EFL Learners Synonymous Errors: A Case Study of Glad and Happy

Study Plan. Bachelor s in. Faculty of Foreign Languages University of Jordan

Can Twitter Predict Royal Baby's Name?

User studies, user behaviour and user involvement evidence and experience from The Danish Dictionary

PHONETIC TOOL FOR THE TUNISIAN ARABIC

High-Performance, Language-Independent Morphological Segmentation

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Chapter 5. Phrase-based models. Statistical Machine Translation

2004/2005 Avg salary - Department academic

Microblog Sentiment Analysis with Emoticon Space Model

AMERICAN COUNCIL ON THE TEACHING OF FOREIGN LANGUAGES (ACTFL)

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

Division of Arts, Humanities & Wellness Department of World Languages and Cultures. Course Syllabus SPANISH I LAN 113

More information >>> HERE <<<

Using Web Search for Machine Translation Nicolas Wehmeier BSc Computing and German 2003/2004

that differ from that of a basic online search:

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

SEO Workshop Keyword and Competitor Research and On Page Optimisation

ANALEC: a New Tool for the Dynamic Annotation of Textual Data

Trend Micro Incorporated. Windows 7 (Unspecified. Tested on 64 bit) Windows Vista (Unspecified. Tested on 64 bit) Windows XP (32/64 bit)

a Chinese-to-Spanish rule-based machine translation

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS BEGINNING SPANISH I SPAN Laboratory Hours: 0.0 Date Revised: Summer 10

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

PyCantonese: Cantonese linguistic research in the age of big data

Department for Energy Development and Independence Shaoceng Wei, Yang Luo, Aron Patrick DRAFT. Summer Electricity Price Prediction Equation

F-SECURE INTERNET SECURITY 2012

Portuguese Corpus-Based Learning Using ETL

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE

Transcription:

Kevin Tang and Andrew Nevins Abstract -ar(e) -ar(e) -er(e) -ir(e) Keywords: 1 Introduction dig dig dig dig +

ouç ouç ouç ouç Verb Vocabulary Size Productivity of ar-er-ir ar-er-ir are-ere-ire -ar -er/-ir -ar -er -ir 2 Data Sources 2.1 English CLMET3.0 Old Bailey

2.1.1 CLMET3.0. 2.2 Portuguese Corpus do Português Colonia Tycho Brahe 2.2.1 Corpus do Português. fixed 2.2.2 Colonia.

2.3 Italian Google Italian Ngram DiaCoris 2.3.1 Google-Ngram:Italian. 2.3.2 DiaCoris. fixed 2.4 Spanish Google Spanish Ngram IMPACT-es 2.4.1 Google Ngram:Spanish. 2.4.2 IMPACT-es.

3 Methods: Verb Vocabulary Size 3.1 Simulations by Random Sampling 3.2 Epoching N

3.3 Lemma estimation burnt burnt ar(e) ir(e) er(e) 4 Analyses: Verb Vocabulary Size 4.1 Simulation results: English, CLMET3.0 4.2 Simulation results: Portuguese, Colonia 4.3 Simulation results: Italian, Google Ngram

-ar/-er/-ir -ar/-er/-ir 4.4 Simulation results: Spanish, Google Ngram

-ar/-er/-ir -ar/-er/-ir

4.5 Interim Summary 5 Methods: Productivity of -ar -er/-ir er-ir -ar 5.1 Simulations by Random Sampling -ar, -er -ir 5.2 Productivity Estimation 5.2.1 ar/( er+ ir). -ar -er -ir -ar -ar 5.2.2 Yang s Productivity Estimate. -ar M N/ln(N) M N M -er -ir -er/-ir -er/-ir

relative M -ar N M N -ar ( ) 1 ar/( er + ir) 6 Analyses: Productivity of 6.1 Simulation results: Portuguese, Corpus do Português -ar 6.2 Simulation results: Portuguese, Colonia -ar 6.3 Simulation results: Italian, Google Ngram are, -ere -ire 6.4 Simulation results: Italian, DiaCoris -ar 6.5 Simulation results: Spanish, Google Ngram -ar

-ar/-er/-ir -ar/-er/-ir 6.6 Simulation results: Spanish, IMPACT-es -ar

-ar/-er/-ir -ar/-er/-ir -ar/-er/-ir -ar/-er/-ir 7 Relationship between Verb vocabulary size and Productivity r p

r p r p 8 Statistical evaluation of the changepoint of verb vocabulary growth -ar changepoint

9 Artefact considerations

9.1 Corpus representativeness 9.2 Tagging accuracy and consistency

-ar -er/-ir -ar -er/-ir -ar -er/-ir -ar -er/-ir without

-ar -er -ir -ar -er -ir 10 Conclusion -ar -er -ir -ar/(-ir+-er) -ar

r p -ar -er -ir References The British industrial revolution in global perspective Literary and Linguistic Computing 7 Word frequency distributions Literary and Linguistic Computing 8 National Endowment for the humanities The European English Messenger 19 JLCL 26 Lancaster University Proceedings of the ACL 2012 system demonstrations Yearbook of morphology 2004 Special volume on non-standard data sources in corpus-based research De Economist 148 Syntactic development, its input and output Proceedings of LREC-2006, the fifth international conference on language resources and evaluation Biometrika 41 Advances in natural language processing arxiv preprint arxiv:1104.2086 Proceedings of the seventh international conference on language resources and evaluation (lrec 10)

arxiv preprint arxiv:1306.3692 Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities Proceedings of international conference on new methods in language processing Linguistic Variation Yearbook 5