INPUTLOG 6.0 a research tool for logging and analyzing writing process data. Linguistic analysis. Linguistic analysis



Similar documents
Presentation Inputlog 6.0: state of the art

The Practice of Social Research in the Digital Age:

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words

Evaluating the Elements of a Piece of Practical Writing The author of this friendly letter..

Chapter 11 Report and Research Basics

Capturing translation processes: a multi-method approach

A Writer s Reference, Seventh Edition Diana Hacker Nancy Sommers

KS2 SATS Goosewell Primary School Parents and teachers working together for the benefit of the children.

Grade 4 Writing Curriculum Map

Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6

KSE Comp. support for the writing process 2 1

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

WRITING FOR THE WEB. Lynn Villeneuve

Machine Translation as a translator's tool. Oleg Vigodsky Argonaut Ltd. (Translation Agency)

Virginia English Standards of Learning Grade 8

Methods in writing process research

9You can help build your customer base and attract more visitors to your ebay

Opportunities for multi-levelling

Binge drinking increases risk of dementia

Survey Results: Requirements and Use Cases for Linguistic Linked Data

The Seven Practice Areas of Text Analytics

Effective Self-Training for Parsing

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE

The Knowledge Sharing Infrastructure KSI. Steven Krauwer

Dutch Parallel Corpus

CHARACTERISTICS FOR STUDENTS WITH: LIMITED ENGLISH PROFICIENCY (LEP)

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Language Arts Literacy Areas of Focus: Grade 6

A Guide to Promoting your Project

Reputation Management System

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Aged Care Nurse Practitioners developing models

Compare characteristic features in traditional stories that meet their purpose and audience?

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

DIAGRAMMING SENTENCES

Level 1 Teacher s Manual

Collecting Polish German Parallel Corpora in the Internet

Language Translation Services RFP Issued: January 1, 2015

Shallow Parsing with Apache UIMA

Facilitating Business Process Discovery using Analysis

Elementary (A1) Group Course

Clustering Connectionist and Statistical Language Processing

Grammar Presentation: The Sentence

Rapid e-learning transforms leadership development

BUSINESS COMMUNICATION. Competency: Grammar Task: Use a verb that correctly agrees with the subject of a sentence.

Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde

A new home page design is being finalized, which will add a new link to material in other languages through the top navigation of the homepage.

Methods in Creating the ibraille Challenge Mobile App for Braille Users

Integrated Skills in English examinations

Brill s rule-based PoS tagger

Robustness and processing difficulty models. A pilot study for eye-tracking data on the French Treebank

HOW TO GENERATE PUBLICITY FOR YOUR NATIONAL SCIENCE WEEK EVENT

PoS-tagging Italian texts with CORISTagger

Making AAC Work in the Classroom ST LUKES ELKS CHILDREN S REHAB

published by

MStM Reading/Language Arts Curriculum Lesson Plan Template

CTSO Course Alignments: Computer Applications

An easy guide to... MARKETING FOR CLUBS

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

GCSE Film Studies Guidance & Frequently Asked Questions

Published on

How To Write Killer Web Content

French Language and Culture. Curriculum Framework

COURSE DESCRIPTION Introduction to Greek grammar, vocabulary, and pronunciation for the beginning student.

62 Hearing Impaired MI-SG-FLD062-02

CERTIFICATION EXAMINATIONS FOR OKLAHOMA EDUCATORS (CEOE )

Elements of Writing Instruction I

A quick guide to. Social Media

Annotated work sample portfolios are provided to support implementation of the Foundation Year 10 Australian Curriculum.

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

Language Arts Literacy Areas of Focus: Grade 5

Graphic Design Best Practices

AK + ASD Writing Grade Level Expectations For Grades 3-6

Customizing an English-Korean Machine Translation System for Patent Translation *

LANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5

The objective setting phase will then help you define other aspects of the project including:

Twitter FREE GUIDE Provided by: Unleash Twitter

How To Write An Essay

Localized twitter opinion mining using sentiment analysis

SOUTH DAKOTA Reading and Communication Arts Standards Grade 9 Literature: The Reader s Choice Course

Automatic Text Analysis Using Drupal

Leveraging Big Data. A case study from Thomson Reuters

MATRIX OF STANDARDS AND COMPETENCIES FOR ENGLISH IN GRADES 7 10

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Grade Genre Skills Lessons Mentor Texts and Resources 6 Grammar To Be Covered

Activities (see attached Appendix G) Page 71 of 100

Lesson Plan. Date(s)... M Tu W Th F

HOW Interactive Design Conference 2013

Transcription:

INPUTLOG 6.0 a research tool for logging and analyzing writing process data Linguistic analysis From character level analyses to word level analyses Linguistic analysis 2 Linguistic Analyses The concept explained Flow linguistic analyses Aggregate letter to word level Parsing the S notation Enriching process data with linguistic information 3 4 marielle.leijten@uantwerpen.be

Aggregate letter to word level Part of speech tagging and chunking 1 Extract word, word groups and sentences Tokenize sentences There is a man sleeping in an easy chair. EX V DT NN V IN DT JJ NN NP EX V DT NN V IN 5 DT JJ NN 6 Part of speech tagging and chunking 2 There is a man sleaping in an easy chair. Enrichment with process data 1 There is a man sleeping in an easy chair. EX V DT NN V IN DT JJ NN Before Word Pause 1, 2 O B Thre<<ere is a_man sleapp<<ping in an easy chair. B B NP 140 593 2 1 The first pause before a word ( 1) The second pause before a word ( 2) B NP I NP B B NP I NP I NP 7 The second pause before a word is at the same time the AfterWord+1 Pause 8 marielle.leijten@uantwerpen.be

There is a man sleaping in an easy chair. Enrichment with process data 2 There is a man sleaping in an easy chair. Enrichment with process data 3 Word production 7207 Thre<<ere is a man sleapp<<ping in an easy chair. 546 Production time of word [EndTime of last Character of Word StartTime first character of word] Within Word Pause Thre<<ere is a man sleapp<<ping in an easy chair. 499 7145 The sum of the pauses within a word [WitinWordPause 1 + WitinWordPause 2 + WitinWordPause N] Man=24976-24430 9 Man=125+374 10 There is a man sleaping in an easy chair. Enrichment with process data 4 Read more After Word Pause +1 Thre<<ere is a_man_sleapp<<ping in an easy ch... 140 234 +1 +1 The first pause after a word (+1) Macken, L., Hoste, V., Leijten, M., & Van Waes, L. (2012). From keystrokes to annotated process data: Enriching the output of Inputlog with linguistic information. Paper presented at the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey. Leijten, M., Macken, L., Hoste, V., Van Horenbeeck, E., & Van Waes, L. (2012). From Character to Word Level: Enabling the Linguistic Analyses of Inputlog Process Data. Paper presented at the European Association for Computational Linguistics, EACL Computational Linguistics and Writing (CL&W 2012): Linguistic and Cognitive Aspects of Document Creation and Document Engineering, Avignon. The AfterWordPause+1 of a is the BeforeWordPause-2 of man 11 12 marielle.leijten@uantwerpen.be

Alzheimers disease Goal research project Test the complementary diagnostic power of a new tool assessing cognitive and linguistic aspects that characterize the process of written language production in Alzheimer's disease (AD) Focusing on motor, cognitive, and linguistic aspects. 13 14 Participants Tasks Three main groups: Patients with mild dementia due to AD Patients with mild cognitive impairment (MCI) due to AD A group of cognitive healthy participants (65 years and older) Copy task Assess person (motor) characteristics Expository task Two figurative elicitation tasks 15 16 marielle.leijten@uantwerpen.be

Ultimate goal General pause results Pause analysis: between words It is our ultimate goal to: 1. describe and test differences between the three participant groups on the basis of a selection of writing process variables (inter and intrapersonal characteristics) 2. test the diagnostic accuracy with a selection of writing process variables (for discriminating AD from healthy elderly) 3... 17 18 Pauses before words Pauses before word categories Verbs, nouns, adjectives Pauses related to revisions are excluded 19 20 marielle.leijten@uantwerpen.be

Pauses before word categories HE Healthy elderly Pauses before word categories CI Cognitive impaired elderly 21 22 Pauses before chunks Verb phrase, noun phrase, prepositional phrase Pauses beginning chunks B Verb phrase, noun phrase, prepositional phrase Extreme large pauses and pauses related to revisions are excluded 23 24 marielle.leijten@uantwerpen.be

Pauses within chunks i Verb phrase, noun phrase, Inputlog 6.0 a research tool for logging and analyzing writing process data Source analysis 25 The flow: in sum Source analyses (full) Iterative cycles from original idfx ~ analyses filtered idfx ~ analyses recoded idfx ~ analyses 27 marielle.leijten@uantwerpen.be

Source analyses (grouped) Source analyses (grouped) Information seeking in professional writing: twitter and e mail communication Contemporary writing & Theory Pilot study Experiment Discussion marielle.leijten@uantwerpen.be

Search process Long term memory: Task schemas Topic knowledge Audience knowledge Linguistic knowledge Genre knowledge External digital sources: Task schemas Topic knowledge Audience knowledge Linguistic knowledge Genre knowledge Source: Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2014). Writing in the workplace: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3), 285 336. (www.jowr.org) Pilot study Assumption: search style may be a predictor of the level of expertise (novice versus expert in digital communication) participants novice writers (5) professional writers (5) tasks write a tweet (max. 140 characters) write an e mail (no indication of length) duration max. 10 minutes and max. 30 minutes Two writing tasks: twitter & e-mail Twitter is a social networking and microblogging service, enabling its users to send and read messages called tweets. Tweets are text based posts up to 140 characters (often based on multiple digital sources) E mail is a method of exchanging digital messages from an author to one or more recipients. E mails consist of three main parts (message envelope, header and body text). E mails can be as long as necessary. marielle.leijten@uantwerpen.be

Observation Observation via Inputlog Inputlog 5* Tobii T60 Eyetracker Retrospective interviews Writing environments (templates via Inputlog) Tweets E-mail: novice Novice writer To all communication science students: interesting conference on internal and organisational communication on April 17 April in Bussum Professional More conversation in the organisation >> interesting conference on internal communication: www.corner stone.nl/ marielle.leijten@uantwerpen.be

E-mail: professional Pilot study Source: Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358 392. doi: 0.1177/0741088313491692 Experiment method Experiment procedure participants novice writers (20) professional writers (20) tasks write a tweet (max. 140 characters) write an e mail (no indication of length) duration max. 10 minutes and max. 30 minutes analysis process measures product measures combined measures 1. Typing test 2. (Reading test) 3. Writing task 1: Tweet about creative session 4. Writing taks 2: Invite colleagues to creative session via e mail 5. Stimulated retrospective interview marielle.leijten@uantwerpen.be

Materials layer 1 Observation layer 2 Inputlog 5* Tobii TX300 Eyetracker (AnHuLab Antwerp) Retrospective interviews layer 3 Results mean number of tweets Results mean age Novices (N=20) Professionals (N=20) Novices Professionals 10 tweets (st.dev. 21) 3115 tweets (st.dev. 3651) 22 (st.dev. 1,9) 32 (st.dev. 9,1) marielle.leijten@uantwerpen.be

Results process measures Results product measures tweet Total characters produced Total characters in final text tweet e mail tweet e mail Results product/process ratio tweet Results product measures e-mail 1 600 1 400 1513 1473 Total characters produced Total characters in final text 1 200 1 000 1177 1074 800 600 400 200 0 novice professional marielle.leijten@uantwerpen.be

Results product/process ratio e-mail Results relative time in sources 100 Relative time spent in other sources is equal for the writer groups Relative time spent in other sources is larger in the twitter task than in the e mail task 80 76 75 60 40 20 tweet e mail tweet e mail 0 novice professional Results other Results Quality tweet Other non discriminating variables are: Novices Professionals Mean number of P bursts Mean duration of P bursts Mean number of S bursts Mean duration of S bursts Number of sources used Duration spent in various sources Transitions absolute Transitions per minute... 0,73 (st.dev. 0,59) 1,78 (st.dev. 0,44) The tweets of the professionals follow more the conventions than the tweets of the novices. Broaden your choices via creative thinking and become a trendsetter! Introductory session: Thursday May 2 @ Bloso, Hazewinkel, Willebroek Trendsetter i.o. trend follower? Learn to think out-of-the box. Register now! #tip #TotalBrainBoxMethod www.hetvarken.wor marielle.leijten@uantwerpen.be

Results Quality e-mail Structure of content (max. 4) Reader orientation (max. 8) Attention for reader (max. 4) Novices 3,1 4,0 2,0 Professionals 3,3 5,4 * 3,0 * The e mails of the professionals are similarly structured than the e mails of the novices The e mails of the professionals are more reader oriented than the e mails of the novices The professionals pay more attention to the reader than the novices Discussion Diversity within writer groups Type of task: internal communication Type of writer: two distinct profiles (long process/text ~ short) Definition of indicators of cognitive processes Indicators at general process level versus within process variability Thank you Literature Nikki Van De Keere, Alexander Kupers, Tinne Moens, Eline Mortelmans, Caroline Van Gils, Elke Eriksson & Sofie Vanwynsberghe (students Master in Multilingual Professional Communication) Eric Van Horenbeeck (technical coordinator Inputlog) Download presentation via ResearchGate of Academia.edu (@marielle leijten) Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2014). Writing in the workplace: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3), 285 336 Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358 392. doi: 0.1177/0741088313491692 Related articles on writing process research: marielle.leijten@uantwerpen.be

Data mining end 47 start 47 tweet 256 vwec 236 Translation Data from study by Isabelle Robert (2014) IE-other 49 IE-search 44 Google 13 Google vwec 49 Translator A Translator B marielle.leijten@uantwerpen.be

resources A target text resources B target text Average duration of episode (s) Number of episodes Pajek bilingual dict. monol.dict. Leijten, M., Van Waes, L., Schriver, K., & Hayes, J.R. (2014 internet bilingual dict. internet source text antidote source text monol.dict. 60 50 40 30 20 10 0 Overall Word 140 120 100 80 60 40 20 0 Overall Word A B More information(@uantwerpen.be) Research Foundation Flanders www.inputlog.net marielle.leijten@uantwerpen.be