SPRING SCHOOL. Empirical methods in Usage-Based Linguistics



Similar documents
Computerized Language Analysis (CLAN) from The CHILDES Project

AphasiaBank. Audrey Holland, Margaret Forbes, Davida Fromm & Brian Macwhinney Brian MacWhinney

Elan. Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

User Guide for ELAN Linguistic Annotator

Text Mining - Scope and Applications

Annotation in Language Documentation

Course Descriptions. Seminar in Organizational Behavior II

Transcribing and annotating audio and video: Jeff Good MPI EVA and the Rosetta Project

MASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS

Reviewed by Ok s a n a Afitska, University of Bristol

Transcriptions in the CHAT format

InqScribe. From Inquirium, LLC, Chicago. Reviewed by Murray Garde, Australian National University

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak

Instructional Design and Technology Professional Core Courses Instructional Design and Technology Core Courses & Descriptions

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

MRes Psychological Research Methods

DOCTOR OF PHILOSOPHY IN EDUCATIONAL PSYCHOLOGY

Tools & Resources for Visualising Conversational-Speech Interaction

ITS Training and Documentation Needs Assessment Project Report

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

Technology in language documentation

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University

2012/2013 Programme Specification Data. Engineering

CLINICAL RESEARCH GENERIC TASK DESCRIPTIONS

Master of Arts in Teaching English to Speakers of Other Languages (MA TESOL)

How to prepare and submit a proposal for EARLI 2015

Using ELAN for transcription and annotation

Study Plan for Master of Arts in Applied Linguistics

Master of Arts in Linguistics Syllabus

209 THE STRUCTURE AND USE OF ENGLISH.

A bachelor degree in Information Systems or equivalent threeyear

Metadata for Corpora PATCOR and Domotica-2

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

LeMoyne-Owen College. Division of Natural and Mathematical Sciences Programming in Java II, COSI 225 Spring, 2016

LEXUS: a web based lexicon tool

The REPERE Corpus : a multimodal corpus for person recognition

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

NEW YORK CITY COLLEGE OF TECHNOLOGY -CUNY

Department of Engineering & Technology IT 112 Product Development & Design Proposed University Studies Course

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Lesson 13. Basic Productivity Applications

Department of Psychology

CHALLENGES OF NON-NATIVE SPEAKERS WITH READING AND WRITING IN COMPOSITION 101 CLASSES. Abstract

CS135 Computer Science I Spring 2015

4.5 Data analysis Transcription

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

MSc/Postgraduate Diploma in Food Technology Quality Assurance For students entering in 2006

Annotation of Human Gesture using 3D Skeleton Controls

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Notes. Business Management. Higher Still. Higher. HSN81200 Unit 1 Outcome 2. Contents. Information and Information Technology 1

BS Environmental Science ( )

MSc in Physics Research For students entering in 2006

Transana 2.60 Distinguishing features and functions

Student Union B, Room 100 (501) Professional and

Zeynep Azar. English Teacher, Açı Private Primary School, Istanbul, Turkey Azar, E.Z.

LAMUS & LAT Archiving software

2. SUMMER ADVISEMENT AND ORIENTATION PERIODS FOR NEWLY ADMITTED FRESHMEN AND TRANSFER STUDENTS

University of Plymouth. Programme Specification. M.Eng. Mechanical Engineering

Introduction. 1. Name of your organisation: 2. Country (of your organisation): Page 2

CHAPTER III RESEARCH METHOD. The research approach used in this study is qualitative method supported

Second Mares Conference Abstract Submission Guidelines

Methods in writing process research

Pre-Masters. Science and Engineering

CoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne

Overview... 2 Institutional Context... 3 Financial Considerations... 4 Pedagogical... 5 Content Analysis... 5 Learner Analysis...

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

Communications & Public Speaking

Psychology G4470. Psychology and Neuropsychology of Language. Spring 2013.

ATLAS.ti 7 Distinguishing features and functions

RR765 Applied Multivariate Analysis

Curriculum for the basic subject at master s level in. IT and Cognition, the 2013 curriculum. Adjusted 2014

School of Labor and Human Resources PhD program

Interactive Data Mining and Visualization

PROGRAM DESIGN. Our design process, philosophy and values.

Jane Zhao Nadalia Liu Digital Media Commons Fondren Library

Teaching Portfolio. Teaching Philosophy

Transcription:

SPRING SCHOOL Empirical methods in Usage-Based Linguistics University of Lille 3, 13 & 14 May, 2013 WORKSHOP 1: Corpus linguistics workshop WORKSHOP 2: Corpus linguistics: Multivariate Statistics for Semantics and Pragmatics WORKSHOP 3: Annotation and analysis of multimodal data using ELAN WORKSHOP 4: Transcription and analysis of oral data using CLAN WORKSHOP 5: Experimental methods of linguistic research WORKSHOP 1 Corpus linguistics workshop Dagmar Divjak d.divjak@sheffield.ac.uk http://www.shef.ac.uk/russian/staff/profiles/divjakd Objectives The hands-on workshop is aimed at absolute beginners: it does not presuppose any knowledge or experience of corpora, but does expect participants to be well-versed in linguistic description. Programme Day 1: Morning: The theory/method/discipline: What is corpus linguistics? What kind of questions can it answer (and which questions can it not answer)? Afternoon: The research question: How to ask a question that can be answered by means of corpus data Day 2: Morning: The data: How to extract relevant data from an existing corpus and annotate it so that it sheds light on your question Afternoon: The results: How to summarize the trends found in your annotated dataset and prepare your data for statistical analysis Bring along 1. Laptop with text editor (such as TextPad http://textpad.com/) and spreadsheet software (such as Excel) 2. Corpus of interest or information on how to access corpus of interest if available on-line 3. Research question to pursue.

WORKSHOP 2 Corpus linguistics : Multivariate Statistics for Semantics and Pragmatics Dylan Glynn http://person2.sol.lu.se/dylanglynn/ Objectives 1. Introduce the method of usage-feature analysis. This method allows traditional research questions 2. Introduce the computer program R This program is entirely free and it probably the most widely used application for doing statistics 3. Introduce Multivariate Statistics for Categorical data Multivariate statistics is needed in all empirical science. It is a vital tool for identifying complex patterns in data and then determining how representative those patterns are. Outline 1. Multivariate Categorical Statistics in Linguistics Presentation of the statistical techniques covered in workshop, how they can be used in Linguistics research 2. Usage-Feature analysis After consultation with group, a sample of language will be analysed. The results of this analysis will form the basis of the data used in the statistical analysis 3. Introduction to R R will be installed and the data will be loaded so that the participants can familiarise themselves with R, we will prepare the data fro analysis 4. Exploratory Statistics: Correspondence Analysis Look for associations between things / identify what is similar and different in complex situation 5. Exploratory Statistics: Cluster Analysis Sort something relative to something(s) else / identify what is similar and different given a constant of comparison. 6. Confirmatory Statistics: Binary Logistic Regression Modelling the data, identifying significant predictors of a binary outcome, determining the descriptive strength and accuracy of the model 7. Confirmatory Statistics: Loglinear Regression and Multinomial Logistic Regression Modelling the data and predicting associations and complex outcomes Pre-requisites 1. Participants are required to bring their own laptop; please download R before the workshop (http://www.r-project.org/); installing it will be done in the workshop 2. No previous experience with R or with statistics or with corpora is required 3. The workshop will be run in English, but the instructor speaks French and will use French when needed or desired.

WORKSHOP 3 Annotation and analysis of multimodal data using ELAN Mark Tutton mark.tutton@univ-nantes.fr http://lling.univ-nantes.fr/index.php?option=com_profils&user=80&lang=fr This workshop focuses on the analysis of gesture in the multimodal annotation program, ELAN. It is divided into three major parts, each of which is outlined below. 1. Gestures: forms and functions What is gesture, and can gestures be broken down into different types? What are the different gesture phases? What particular elements of gestures should be analysed (for example, the hand used, the form of the gesture), and how should these be coded (i.e. by creating a set list of sub-categories?) This part of the workshop looks at what gestures you should analyse and how you should code them. Hence, it addresses methodological questions that need to be answered before analysing data in ELAN. 2. Using Elan to code gestural data The second part of the workshop looks at how to use ELAN to code your gestural data. More specifically, it targets the following areas: - Identifying the types of video files that can be imported into ELAN - Creating a new Elan file - Developing an annotation template that can be applied to multiple files - Creating tiers and dependencies to analyse relevant gesture variables (for example, the hand(s) used, hand form) - Devising a 'controlled vocabulary' that provides you with a handy list of set coding possibilities for the gesture variables being analysed - Using frame-by-frame analysis to analyse gestures - How to export data from ELAN - Analysing data across multiple files 3. Using ELAN to analyse your own data The third part of the workshop requires participants to apply the skills learned earlier on to the analysis of their own data. They will create their own ELAN template and analyse a portion of their video data in the program, allowing them to troubleshoot for any problems that may arise in their analysis. Requirements No previous experience in gesture analysis is required just motivation to learn! Participants should bring with them a laptop and audiovisual data to be analysed (a five-minute clip that features (a) gesturing speaker(s) will be sufficient to practice on). ELAN accepts a range of data file types, including.avi,.mov and.mpg formats. Participants should download and install 'ELAN' before the workshop. They can do so by visiting the following website: http://tla.mpi.nl/tools/tla-tools/elan/

WORKSHOP 4 Transcription and analysis of oral data using CLAN Outline of the workshop 1. CLAN, CHILDES and sharing data Presentation of the CHILDES initiative: corpus, tools for child language and other linguistic data (TalkBank). - Ground Rules for data sharing for all TalkBank databases o Using data o Providing data - Corpus databases available o 37 different languages - Downloading and visualising databases o Transcriptions o Media - Tools: Clan, Phon 2. Creating your own oral language corpus Transcribing with CLAN - What does it mean to transcribe oral language data? - The CHAT Transcription Format o Organisation of a transcript o Coding oral language o Supplementary annotations - Linking sound or video with the transcription 3. Using a corpus for research with the CLAN commands - What does it mean to have a corpus and use it for research? - Using the clan commands on the transcriptions o Searching data Navigation within the database Searching multiples tiers o Creating a lexicon o Syntactic analysis and morphosyntactic coding o Modifying and coding the data o Interface with worksheet processors o Interface with textometric software and other corpus software Requirements Participants should bring a laptop. They can bring their own data, either sound (wave or mp3 format) or video (Quicktime format; this includes MOV and most MP4 formats please check that your video can be displayed using Quicktime). CLAN and Quicktime should be installed before the workshop. Both are free, work on PC and MAC, and can be download at: CLAN: http://childes.psy.cmu.edu/clan/ Quicktime: http://www.apple.com/quicktime/download/

WORKSHOP 5 Experimental methods of linguistic research Efstathia Soroli efstathia.soroli@univ-lille3.fr http://stl.recherche.univ-lille3.fr/sitespersonnel 1. Overview Current scientific research in all disciplines requires strong empirical support, data for validation (or refutation) and deep knowledge of new experimental techniques. Furthermore, data requires the solid foundation of a theory for interpretation. Although experimental support is a central element in order to answer fundamental questions of linguistic theory little is still used in linguistics. Gradually, linguists have come to realize the importance of corpus-based reasearch and experimental methods. In making the transition to objective/controlled empirical methods, linguists are confronted with problems linked to the framing of their scientific questions, the operationalization of their hypotheses, issues related to the experimental design, the implementation and the (qualitative and quantitative) analysis. 2. Aims This workshop aims (a) to inform participants about the possibilities that psycholinguistic research offers and (b) to help researchers : - to acquire the fundamental logic of experimental design - to understand and use an experiment-builder tool (E-Prime) for design and setup. - to implement and interpret the results of various kinds of experiments (categorization tasks, production, acceptability judgements / various measures: verbal, eye movements, reaction times, etc.). 3. Prerequisites/requirements Participants must be able to understand English (English sessions), and have a good background in language theories and statistical methods. They are also required to bring a laptop computer (for the practice sessions on day 2). 4. Sessions and topics to be covered PART I : Theory (day 1) - Linguistics and empirical data : a brief historical overview (session 1) - Psycholinguistics and basic empirical methods (session 2) - Building a controlled corpus-based research protocol (session 3) - Framing a scientific question, and setting up a question that can be answered through an experiment (session 4) PART II : Practice (day 2) - Fundamental concepts of experimental design (session 5) - Experimental design using E-prime (session 6-7) - Analysing and interpreting the results (session 8).