Cultural Trends and language change

Size: px
Start display at page:

Download "Cultural Trends and language change"

Transcription

1 Cultural Trends and language change Gosse Bouma Information Science University of Groningen NHL 2015/03 Gosse Bouma 1/25

2 Popularity of Wolf in English books Gosse Bouma 2/25

3 Google Books Ngrams Digital Library Google Books is a project where books are scanned and turned into text using OCR (Optical Character Recognition) and made searchable with Google search. currently approx. 20 M books, mostly English, mostly since 1800 Google Books Ngrams: Valuable resource for cultural and linguistic studies Gosse Bouma 3/25

4 Google Books Google Books & ngrams viewer The Google Labs N-gram Viewer is the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ( United States of America ) in books over time. You ll be searching through over 5.2 million books: 4% of all books ever published! A-users-guide-to-culturomics Gosse Bouma 4/25

5 Popularity of various *isms Gosse Bouma 5/25

6 Google Books Ngrams Viewer Jean-Baptiste Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books. Science books.html Gosse Bouma 6/25

7 Google Books Ngrams Viewer Examples freedom, liberty vampire, werewolf, zombie computer, phone, radio, gun radio, television, internet best, beft Gosse Bouma 7/25

8 Google Books 2.0 The USA is/are Do people think of the United States as a singular or plural entity? And is this constant over time? Gosse Bouma 8/25

9 Google Books 2.0 The USA is/are Do people think of the United States as a singular or plural entity? And is this constant over time? We can answer this question by computing how often the United states is followed by examples of frequent plural verbs and divide the result by the overall frequency of the United States, We can do the same for the United states followed by a frequent singular verb. Gosse Bouma 8/25

10 Google Books 2.0 The USA is/are Do people think of the United States as a singular or plural entity? And is this constant over time? We can answer this question by computing how often the United states is followed by examples of frequent plural verbs and divide the result by the overall frequency of the United States, We can do the same for the United states followed by a frequent singular verb. The result is shown here: Notice the use of arithmetic to sum and divide frequencies. Gosse Bouma 8/25

11 Determiners and Country Names "The Ukraine" is incorrect both grammatically and politically, says Oksana Kyzyma of the Embassy of Ukraine in London. "Ukraine is both the conventional short and long name of the country," she says. "This name is stated in the Ukrainian Declaration of Independence and Constitution." Gosse Bouma 9/25

12 Google Books 2.0 Spelling,phrasal expressions recognise, recognize (try) gone missing (try) graduated college vs graduated from college Search using part of speech experience_noun vs. experience_verb (try) to always _VERB_, never, quickly, boldly Arithmetic (The United States have + The United States are)/the United States (The United States has + The United States is)/the United States try Gosse Bouma 10/25

13 Literary Applications First Person Fiction Ted Underwood suggests there is a sharp drop in the percentage of first person narratives around 1800 Can we investigate this using corpus linguistics? we-dont-already-know-the-broad-outlines-of-literary-history Gosse Bouma 11/25

14 Literary Applications Given novels that are clearly written with a 1st or 3rd person narrator Which words do occur significantly more often in 1st or 3rd person novels? Gosse Bouma 12/25

15 Literary Applications Given a large collection of fiction books: Does the ratio between 1st and 3rd person pronouns change over time? Gosse Bouma 13/25

16 Using Syntax Dependency Relations What are frequent direct objects of drink? drink => *_NOUN What things are magnificent? *_NOUN_ => magnificent Gosse Bouma 14/25

17 Google Ngrams Google BOOKS ngram viewer uses books Google NGRAMS is Web data Gosse Bouma 15/25

18 Dutch Twitter Corpus Since 2011 the Information Science Department of the University of Groningen has been collecting Dutch language tweets. The goal is to collect a representative sample of all tweets posted in Dutch. We estimate that our method captures approximately 40-60% of the relevant tweets. Gosse Bouma 16/25

19 Dutch Twitter Corpus Since 2011 the Information Science Department of the University of Groningen has been collecting Dutch language tweets. The goal is to collect a representative sample of all tweets posted in Dutch. We estimate that our method captures approximately 40-60% of the relevant tweets. RieksOsinga #CTAboutaleb op Mooie en inspirerende woorden. AHPOIESZ Vandaag ons winnend concept gepresenteerd aan College van is er een probleem met de mail? Krijg namelijk een 500 interland server Error. Gelukkig! ICT geeft ook aan dat er geen storing, dat scheelt;-) Fijne zondag nog! NHL_Hogeschool Klaar voor collegereeks Met twee bedrijven Gosse Bouma 16/25

20 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en Gosse Bouma 17/25

21 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en trigram count perc wel eens voorkomen wel is voorkomen Gosse Bouma 17/25

22 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en trigram count perc wel eens voorkomen wel is voorkomen wel eens gebeuren 2, wel is gebeuren Gosse Bouma 17/25

23 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en trigram count perc wel eens voorkomen wel is voorkomen wel eens gebeuren 2, wel is gebeuren wel eens zien 34, wel is zien 10, Gosse Bouma 17/25

24 Spelling Variation zeςma Zeς ma is a discourse marker of Arabic etymology that is used in North Africa as well as in the French and Dutch varieties spoken by the North African diaspora in Europe...On the internet, users either omit ς or use another character, e.g. the digit 3. (Bouwmans, 2003) Source Dutch Twitter Ngram counts, Query ze%ma Gosse Bouma 18/25

25 Spelling Variation zeςma Zeς ma is a discourse marker of Arabic etymology that is used in North Africa as well as in the French and Dutch varieties spoken by the North African diaspora in Europe...On the internet, users either omit ς or use another character, e.g. the digit 3. (Bouwmans, 2003) Source Dutch Twitter Ngram counts, Query ze%ma hits word hits word 76,218 zehma 2,845 Zegma 45,561 ze3ma 2,149 ze3hma 29,944 zegma 1,707 zemma 15,058 Zehma 1,568 ZEHMA 8,448 Ze3ma 1,553 zema (and at least 20 other spelling variants) Gosse Bouma 18/25

26 Language Change: Popularity of der (her/there) ik ga der geld geven voor der verjaardag Mag je der vaseline opdoen? Dutch Twitter Ngram counts 2014 Gosse Bouma 19/25

27 Ngram statistics Why use ngram counts? Given enough data, ngram frequencies are often sufficient to study variation and trends Dutch Twitter Corpus # (Million) tweets 2,500 tokens 28,000 unigrams bigrams trigrams grams grams Gosse Bouma 20/25

28 Comparable Tools and Resources Twitter viewers (Univ Groningen, twiqs.nl) : links to actual tweets, trends, metadata slow for large periods and/or frequent ngrams Google Web 1T 5-Gram Database for European languages: ngram counts for 133 billion words of Dutch webtext Regex search, collocations: Corpus Frequency counts Keuleers et al (2010), word frequencies based on Dutch subtitles... Rovereto Twitter n-gram corpus with demographic metadata Herdagdelen (2013) : a Twitter-based dataset using n-grams, thereby overcoming the limitations on the redistribution of raw tweets n-gram counts for 75 million English tweets With gender-of-author and time-of-posting Gosse Bouma 21/25

29 Twitter Ngrams Web Interface Raw ngram counts ( ) Limited regex support, export results as csv, collocations, associations Run your own experiment : Download ngrams data Evert (2010), Google Web 1T 5-Grams Made Easy (but not for the computer) Gosse Bouma 22/25

30 Twitter Ngrams Web Interface - Trends Relative frequencies per month For ngrams occurring at least once in each month Using sqllite + Google Tables Gosse Bouma 23/25

31 Twitter vs Google Web Ngrams een meisje/liedje/... die/dat noun Twitter %die web %die ratio meisje liedje kind type bedrijf boek nummer geld filmpje ding Twitter: , Web Ngrams: 2008 Gosse Bouma 24/25

32 Enjoy! Gosse Bouma 25/25

PoliticalMashup. Make implicit structure and information explicit. Content

PoliticalMashup. Make implicit structure and information explicit. Content 1 2 Content Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Overview project Zooming in on one cultural heritage dataset A few

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015

NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 Starting a Python and an NLTK Session Open a Python 2.7 IDLE (Python GUI) window or a Python interpreter

More information

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts

More information

Er is door mij gebruik gemaakt van dia s uit presentaties van o.a. Anastasios Kesidis, CIL, Athene Griekenland, en Asaf Tzadok, IBM Haifa Research Lab

Er is door mij gebruik gemaakt van dia s uit presentaties van o.a. Anastasios Kesidis, CIL, Athene Griekenland, en Asaf Tzadok, IBM Haifa Research Lab IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Er is door mij gebruik gemaakt van dia s uit presentaties

More information

Technology Scouting Video Transcription

Technology Scouting Video Transcription Project: Video for End-users Technology Scouting Video Transcription User stories Version: 1.0 Date: March 5, 2010 SURFnet / Kennisnet Innovatieprogramma 2010 Video Transcription: User stories 2 Introduction

More information

SAND: Relation between the Database and Printed Maps

SAND: Relation between the Database and Printed Maps SAND: Relation between the Database and Printed Maps Erik Tjong Kim Sang Meertens Institute erik.tjong.kim.sang@meertens.knaw.nl May 16, 2014 1 Introduction SAND, the Syntactic Atlas of the Dutch Dialects,

More information

Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining? Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]

More information

Course description Course title: Dutch Language I: Introduction Course code: EN-IN-DLID Domein: Bewegen & Educatie > Education Objectives

Course description Course title: Dutch Language I: Introduction Course code: EN-IN-DLID Domein: Bewegen & Educatie > Education Objectives Course description Course title: Dutch Language I: Introduction Course code: EN-IN-DLID Domein: Bewegen & Educatie > Education Objectives Understanding basic vocabulary: words (Dutch to English); Use of

More information

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation

More information

Unit: Fever, Fire and Fashion Term: Spring 1 Year: 5

Unit: Fever, Fire and Fashion Term: Spring 1 Year: 5 Unit: Fever, Fire and Fashion Term: Spring 1 Year: 5 English Fever, Fire and Fashion Unit Summary In this historical Unit pupils learn about everyday life in London during the 17 th Century. Frost fairs,

More information

The information in this report is confidential. So keep this report in a safe place!

The information in this report is confidential. So keep this report in a safe place! Bram Voorbeeld About this Bridge 360 report 2 CONTENT About this Bridge 360 report... 2 Introduction to the Bridge 360... 3 About the Bridge 360 Profile...4 Bridge Behaviour Profile-Directing...6 Bridge

More information

MAYORGAME (BURGEMEESTERGAME)

MAYORGAME (BURGEMEESTERGAME) GATE Pilot Safety MAYORGAME (BURGEMEESTERGAME) Twan Boerenkamp Who is it about? Local council Beleidsteam = GBT or Regional Beleidsteam = RBT Mayor = Chairman Advisors now = Voorlichting? Official context

More information

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage Whitepaper Leveraging Social Media Analytics for Competitive Advantage May 2012 Overview - Social Media and Vertica From the Internet s earliest days computer scientists and programmers have worked to

More information

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme)

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme) The STEVIN IRME Project Jan Odijk STEVIN Midterm Workshop Rotterdam, June 27, 2008 IRME Identification and lexical Representation of Multiword Expressions (MWEs) Participants: Uil-OTS, Utrecht Nicole Grégoire,

More information

The acquisition of grammatical gender in bilingual child acquisition of Dutch (by older Moroccan and Turkish children)

The acquisition of grammatical gender in bilingual child acquisition of Dutch (by older Moroccan and Turkish children) The acquisition of grammatical gender in bilingual child acquisition of Dutch (by older Moroccan and Turkish children) The definite determiner, attributive adjective and relative pronoun Leonie Cornips,

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Finding Syntactic Characteristics of Surinamese Dutch

Finding Syntactic Characteristics of Surinamese Dutch Finding Syntactic Characteristics of Surinamese Dutch Erik Tjong Kim Sang Meertens Institute erikt(at)xs4all.nl June 13, 2014 1 Introduction Surinamese Dutch is a variant of Dutch spoken in Suriname, a

More information

Special Interest Group Oracle WebCenter

Special Interest Group Oracle WebCenter Special Interest Group Oracle WebCenter Eric Bos Oracle ECM Consultant 28 Oktober 2013 1 Oracle WebCenter Capture 1. Webcenter Capture vs OFR (Perceptive IDC) 2. WebCenter Capture 3. Workspaces en andere

More information

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400

More information

Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde

Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde LREC 2012, Istanbul May 25, 2012 NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics September

More information

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012 Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets

More information

Submission guidelines for authors and editors

Submission guidelines for authors and editors Submission guidelines for authors and editors For the benefit of production efficiency and the production of texts of the highest quality and consistency, we urge you to follow the enclosed submission

More information

Mapping linguistic phenomena on Twitter and other big data sources. Gabriel Doyle UC San Diego 2014 LSA Annual Meeting

Mapping linguistic phenomena on Twitter and other big data sources. Gabriel Doyle UC San Diego 2014 LSA Annual Meeting Mapping linguistic phenomena on Twitter and other big data sources Gabriel Doyle UC San Diego 2014 LSA Annual Meeting Big data most major corpora are hundreds of millions of words at most Twitter users

More information

Linguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10

Linguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10 Linguistic Research with CLARIN Jan Odijk MA Rotation Utrecht, 2015-11-10 1 Overview Introduction Search in Corpora and Lexicons Search in PoS-tagged Corpus Search for grammatical relations Search for

More information

CONTENT / ACTIVITY CAN DO PAGE LEVEL GRAMMAR

CONTENT / ACTIVITY CAN DO PAGE LEVEL GRAMMAR Speakout Starter Speakout CEF ALTE UCLES IELTS TOEIC TOEFL ibt PTE Starter - - 0-245 9-18 Elementary /A2 1 KET 3.0 246-500 19-29 1 Pre-intermediate A2/B1 2 PET 4.0 500-650 30-52 2 Intermediate B1+/B2 3

More information

LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH

LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH Gertjan van Noord Deliverable 3-4: Report Annotation of Lassy Small 1 1 Background Lassy Small is the Lassy corpus in which the syntactic annotations

More information

The WITCHCRAFT Project: A Progress Report

The WITCHCRAFT Project: A Progress Report The WITCHCRAFT Project: A Progress Report Frans Wiering IMS Study Group Meeting, Zürich, 10 July 2007 Talk outline CATCH programme WITCHCRAFT project aim and team partners and their contribution results

More information

Research Report. Ingelien Poutsma Marnienke van der Maal Sabina Idler

Research Report. Ingelien Poutsma Marnienke van der Maal Sabina Idler Research Report Ingelien Poutsma Marnienke van der Maal Sabina Idler Research report ABSTRACT This research investigates what the ideal bank for adolescents (10 16 years) looks like. The research was initiated

More information

Annotation Guidelines for Dutch-English Word Alignment

Annotation Guidelines for Dutch-English Word Alignment Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College

More information

6 TWITTER ANALYTICS TOOLS. SOCIAL e MEDIA AMPLIFIED

6 TWITTER ANALYTICS TOOLS. SOCIAL e MEDIA AMPLIFIED 6 TWITTER ANALYTICS TOOLS SOCIAL e MEDIA AMPLIFIED 2 WHY USE TWITTER ANALYTICS TOOLS? Monitor and analysing Twitter projects are key components of Twitter campaigns. They improve efficiency and results.

More information

Utrecht Linguistic Database. Computational Tools for Linguistic Data March 15, 2002. Rapid Application Development

Utrecht Linguistic Database. Computational Tools for Linguistic Data March 15, 2002. Rapid Application Development Utrecht Linguistic Database Computational Tools for Linguistic Data March 15, 2002 Maaike Schoorlemmer Lennart Herlaar Harmen van der Iest Martin Everaert Alexis Dimitriadis Peter Ackema 1 Introduction

More information

Index. 1. Case background 2. What did we test? 3. The results! 4. About Online Dialogue. How the usability of the internal search module

Index. 1. Case background 2. What did we test? 3. The results! 4. About Online Dialogue. How the usability of the internal search module internal search optimization with instant search How the usability of the internal search module lifted the conversion rate of this audience with 49% Index 1. Case background 2. What did we test? 3. The

More information

Real-Time Identification of MWE Candidates in Databases from the BNC and the Web

Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text

More information

Going Paperless The Utah Experience. Mike Pecorelli Project Manager Utah DEQ

Going Paperless The Utah Experience. Mike Pecorelli Project Manager Utah DEQ Going Paperless The Utah Experience Mike Pecorelli Project Manager Utah DEQ Topic Overview Three Key Topics Interactive Map GIS Tool Electronic Document Management System Database Interactive Map

More information

Acquiring grammatical gender in northern and southern Dutch. Jan Klom, Gunther De Vogelaer

Acquiring grammatical gender in northern and southern Dutch. Jan Klom, Gunther De Vogelaer Acquiring grammatical gender in northern and southern Acquring grammatical gender in southern and northern 2 Research questions How does variation relate to change? (transmission in Labov 2007 variation

More information

A chart generator for the Dutch Alpino grammar

A chart generator for the Dutch Alpino grammar June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:

More information

Workflow Solutions for Very Large Workspaces

Workflow Solutions for Very Large Workspaces Workflow Solutions for Very Large Workspaces February 3, 2016 - Version 9 & 9.1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

More information

Cambridge Primary English as a Second Language Curriculum Framework

Cambridge Primary English as a Second Language Curriculum Framework Cambridge Primary English as a Second Language Curriculum Framework Contents Introduction Stage 1...2 Stage 2...5 Stage 3...8 Stage 4... 11 Stage 5...14 Stage 6... 17 Welcome to the Cambridge Primary English

More information

Kids College Computer Game Programming Exploring Small Basic and Procedural Programming

Kids College Computer Game Programming Exploring Small Basic and Procedural Programming Kids College Computer Game Programming Exploring Small Basic and Procedural Programming According to Microsoft, Small Basic is a programming language developed by Microsoft, focused at making programming

More information

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

More information

IP-NBM. Copyright Capgemini 2012. All Rights Reserved

IP-NBM. Copyright Capgemini 2012. All Rights Reserved IP-NBM 1 De bescheidenheid van een schaker 2 Maar wat betekent dat nu 3 De drie elementen richting onsterfelijkheid Genomics Artifical Intelligence (nano)robotics 4 De impact van automatisering en robotisering

More information

Buurten van gemeente Groningen

Buurten van gemeente Groningen Page 1 of 7 Buurten van gemeente Groningen Shapefile Tags Buurtindeling Groningen Summary buurtindeling van de gemeente Groningen, buurten zijn samengesteld uit subbuurten Description Het bestand buurtindeling.shp

More information

How To Test A Website On A Web Browser

How To Test A Website On A Web Browser user checks! improve your design significantly" Workshop by Userneeds - Anouschka Scholten Assisted by ArjanneAnouk Interact Arjanne de Wolf AmsterdamUX Meet up - June 3, 2015 Make people s lives better.

More information

Interpreting Web Analytics Data

Interpreting Web Analytics Data Interpreting Web Analytics Data Whitepaper 8650 Commerce Park Place, Suite G Indianapolis, Indiana 46268 (317) 875-0910 info@pentera.com www.pentera.com Interpreting Web Analytics Data At some point in

More information

Software product management. Inge van de Weerd

Software product management. Inge van de Weerd Software product management Inge van de Weerd Hoe ben je naar dit college gekomen? A. Lopend B. Met de fiets C. Met het openbaar vervoer D. Met de auto E. Anders Agenda Software Product Management Requirements

More information

Long, often quite boring, notes of meetings

Long, often quite boring, notes of meetings Long, often quite boring, notes of meetings 1 Long, often quite boring, notes of meetings www.polidocs.nl Maarten Marx Universiteit van Amsterdam February 2009 Long, often quite boring, notes of meetings

More information

CLARIN project DiscAn :

CLARIN project DiscAn : CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora Ted Sanders Kirsten Vis Utrecht Institute of Linguistics Utrecht University Daan Broeder TLA Max-Planck Institute

More information

Are you ready for more efficient and effective ways to manage discovery?

Are you ready for more efficient and effective ways to manage discovery? LexisNexis Early Data Analyzer + LAW PreDiscovery + Concordance Software Are you ready for more efficient and effective ways to manage discovery? Did you know that all-in-one solutions often omit robust

More information

Mining a Corpus of Job Ads

Mining a Corpus of Job Ads Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department

More information

COOLS COOLS. Cools is nominated for the Brains Award! www.brainseindhoven.nl/nl/top_10/&id=507. www.cools-tools.nl. Coen Danckmer Voordouw

COOLS COOLS. Cools is nominated for the Brains Award! www.brainseindhoven.nl/nl/top_10/&id=507. www.cools-tools.nl. Coen Danckmer Voordouw Name Nationality Department Email Address Website Coen Danckmer Voordouw Dutch / Nederlands Man and Activity info@danckmer.nl www.danckmer.nl Project: Image: Photographer: Other images: COOLS CoenDVoordouw

More information

EXAMPLES OF HOW I USED A CORPUS IN WORDSMITH TOOLS TO TACKLE TRANSLATION PROBLEMS IN A TEXT ENTITLED PERFIL BIOFÍSICO FETAL

EXAMPLES OF HOW I USED A CORPUS IN WORDSMITH TOOLS TO TACKLE TRANSLATION PROBLEMS IN A TEXT ENTITLED PERFIL BIOFÍSICO FETAL EXAMPLES OF HOW I USED A CORPUS IN WORDSMITH TOOLS TO TACKLE TRANSLATION PROBLEMS IN A TEXT ENTITLED PERFIL BIOFÍSICO FETAL I built my corpus by downloading texts from the MEDLINE database. I found English

More information

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14 CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based

More information

Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch.

Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch. Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch Natalia Levshina Outline 1. Dutch causative Cx with doen 2. Data and method 3. Quantitative

More information

ENGLISH LANGUAGE - SCHEMES OF WORK. For Children Aged 8 to 12

ENGLISH LANGUAGE - SCHEMES OF WORK. For Children Aged 8 to 12 1 ENGLISH LANGUAGE - SCHEMES OF WORK For Children Aged 8 to 12 English Language Lessons Structure Time Approx. 90 minutes 1. Remind class of last topic area explored and relate to current topic. 2. Discuss

More information

PICCL: Philosophical Integrator of Computational and Corpus Libraries

PICCL: Philosophical Integrator of Computational and Corpus Libraries 1 PICCL: Philosophical Integrator of Computational and Corpus Libraries Martin Reynaert 12, Maarten van Gompel 1, Ko van der Sloot 1 and Antal van den Bosch 1 Center for Language Studies - Radboud University

More information

Natural answer presentation. through revision. of syntactic patterns

Natural answer presentation. through revision. of syntactic patterns University of Twente Faculty: Electrical Engineering, Mathematics & Computer Science Department: Computer Science Group: Language, Knowledge & Interaction Natural answer presentation through revision of

More information

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning Morphology Morphology is the study of word formation, of the structure of words. Some observations about words and their structure: 1. some words can be divided into parts which still have meaning 2. many

More information

The New Forest Small School

The New Forest Small School The New Forest Small School Spanish For Children Aged 11 to 16 OCR GCSE in Spanish J732 AIMS AND OBJECTIVES To provide: A meaningful and enjoyable educational experience Known and achievable but challenging

More information

Author Gender Identification of English Novels

Author Gender Identification of English Novels Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in

More information

Opportunities in the South Korean cheese market. Kansendossier Zuid-Korea

Opportunities in the South Korean cheese market. Kansendossier Zuid-Korea Opportunities in the South Korean cheese market Kansendossier Zuid-Korea Inhoud 1 Opportunities in the South Korean cheese market 3 1.1 1.2 Fast growing market 3 Premiumization 4 1.3 High potentials in

More information

FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA

FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA Kalev H. Leetaru Yahoo! Fellow in Residence Georgetown University kalev.leetaru5@gmail.com http://www.kalevleetaru.com FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA AUDIENCE QUESTION Have you

More information

How To Analyse The Diffusion Patterns Of A Lexical Innovation In Twitter

How To Analyse The Diffusion Patterns Of A Lexical Innovation In Twitter GOOD MORNING TWEETHEARTS! : THE DIFFUSION OF A LEXICAL INNOVATION IN TWITTER REBECCA MAYBAUM (University of Haifa) Abstract The paper analyses the diffusion patterns of a community-specific lexical innovation,

More information

1. Dimensional Data Design - Data Mart Life Cycle

1. Dimensional Data Design - Data Mart Life Cycle 1. Dimensional Data Design - Data Mart Life Cycle 1.1. Introduction A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople

More information

WEB OF SCIENCE CORE COLLECTION

WEB OF SCIENCE CORE COLLECTION What is Web of Science Core Collection? Search over 55 million records from the top journals, conference proceedings, and books in the sciences, social sciences, and arts and humanities to find the high

More information

Scopus. Quick Reference Guide

Scopus. Quick Reference Guide Scopus Quick Reference Guide Quick Reference Guide An eye on global research. Scopus is the largest abstract and citation database of peer-reviewed literature, with bibliometrics tools to track, analyze

More information

REST web services. Representational State Transfer Author: Nemanja Kojic

REST web services. Representational State Transfer Author: Nemanja Kojic REST web services Representational State Transfer Author: Nemanja Kojic What is REST? Representational State Transfer (ReST) Relies on stateless, client-server, cacheable communication protocol It is NOT

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

English. Universidad Virtual. Curso de sensibilización a la PAEP (Prueba de Admisión a Estudios de Posgrado) Parts of Speech. Nouns.

English. Universidad Virtual. Curso de sensibilización a la PAEP (Prueba de Admisión a Estudios de Posgrado) Parts of Speech. Nouns. English Parts of speech Parts of Speech There are eight parts of speech. Here are some of their highlights. Nouns Pronouns Adjectives Articles Verbs Adverbs Prepositions Conjunctions Click on any of the

More information

LANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5

LANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5 Page 1 of 57 Grade 3 Reading Literary Text Principles of Reading (P) Standard 1: Demonstrate understanding of the organization and basic features of print. Standard 2: Demonstrate understanding of spoken

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

Albert Pye and Ravensmere Schools Grammar Curriculum

Albert Pye and Ravensmere Schools Grammar Curriculum Albert Pye and Ravensmere Schools Grammar Curriculum Introduction The aim of our schools own grammar curriculum is to ensure that all relevant grammar content is introduced within the primary years in

More information

WRITING FOR THE WEB. Lynn Villeneuve lynn@astrolabewebsites.ca

WRITING FOR THE WEB. Lynn Villeneuve lynn@astrolabewebsites.ca . WRITING FOR THE WEB Lynn Villeneuve lynn@astrolabewebsites.ca Adopting a specialized writing style for the web is important for reasons such as readability, search engine optimization and accessibility.

More information

English Appendix 2: Vocabulary, grammar and punctuation

English Appendix 2: Vocabulary, grammar and punctuation English Appendix 2: Vocabulary, grammar and punctuation The grammar of our first language is learnt naturally and implicitly through interactions with other speakers and from reading. Explicit knowledge

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

WHITEPAPER. Text Analytics Beginner s Guide

WHITEPAPER. Text Analytics Beginner s Guide WHITEPAPER Text Analytics Beginner s Guide What is Text Analytics? Text Analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content

More information

Introduction to the Database

Introduction to the Database Introduction to the Database There are now eight PDF documents that describe the CHILDES database. They are all available at http://childes.psy.cmu.edu/data/manual/ The eight guides are: 1. Intro: This

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

Modern foreign languages

Modern foreign languages Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007

More information

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise 5 APR 2011 1 2005... Advanced Analytics Harnessing Data for the Warfighter I2E GIG Brigade Combat Team Data Silos DCGS LandWarNet

More information

THE EMOTIONAL VALUE OF PAID FOR MAGAZINES. Intomart GfK 2013 Emotionele Waarde Betaald vs. Gratis Tijdschrift April 2013 1

THE EMOTIONAL VALUE OF PAID FOR MAGAZINES. Intomart GfK 2013 Emotionele Waarde Betaald vs. Gratis Tijdschrift April 2013 1 THE EMOTIONAL VALUE OF PAID FOR MAGAZINES Intomart GfK 2013 Emotionele Waarde Betaald vs. Gratis Tijdschrift April 2013 1 CONTENT 1. CONCLUSIONS 2. RESULTS Reading behaviour Appreciation Engagement Advertising

More information

Grade 1 LA. 1. 1. 1. 1. Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27

Grade 1 LA. 1. 1. 1. 1. Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27 Grade 1 LA. 1. 1. 1. 1 Subject Grade Strand Standard Benchmark Florida K-12 Reading and Language Arts Standards 27 Grade 1: Reading Process Concepts of Print Standard: The student demonstrates knowledge

More information

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by 1 Measure Social Media like a Pro: Social Media Analytics Uncovered # SOCIAL MEDIA LIKE # SHARE Powered by 2 Social media analytics were a big deal in 2013, but this year they are set to be even more crucial.

More information

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words Standard 3: Writing Process 3.1: Prewrite 58-69% 10.LA.3.1.2 Generate a main idea or thesis appropriate to a type of writing. (753.02.b) Items may include a specified purpose, audience, and writing outline.

More information

Sum of all paintings opening slide Introduce myself. Nlwp, Commons, Wikidata, GLAMwiki, bots, Wiki Loves Monuments, uploads, Based on Wikimania 2015

Sum of all paintings opening slide Introduce myself. Nlwp, Commons, Wikidata, GLAMwiki, bots, Wiki Loves Monuments, uploads, Based on Wikimania 2015 Sum of all paintings opening slide Introduce myself. Nlwp, Commons, Wikidata, GLAMwiki, bots, Wiki Loves Monuments, uploads, Based on Wikimania 2015 presentation. Not a lot of overlap in people! 1 Galleries,

More information

NAAR NEDERLAND HANDLEIDING

NAAR NEDERLAND HANDLEIDING NAAR NEDERLAND HANDLEIDING www.naarnederland.nl 1. Introduction As of 15 March 2006, certain foreign nationals wishing to settle in the Netherlands for a prolonged period who require a provisional residence

More information

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged

More information

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE SPAN 100/101 ELEMENTARY SPANISH COURSE OBJECTIVES This Spanish course pays equal attention to developing all four language skills (listening, speaking, reading, and writing), with a special emphasis on

More information

A comparative analysis of the language used on labels of Champagne and Sparkling Water bottles.

A comparative analysis of the language used on labels of Champagne and Sparkling Water bottles. This research essay focuses on the language used on labels of five Champagne and five The two products are related, both being sparkling beverages, but also have obvious differences, primarily that one

More information

Opportunity Report on Korean gaming Kansendossier Korea

Opportunity Report on Korean gaming Kansendossier Korea Opportunity Report on Korean gaming Kansendossier Korea Game is not a new field in Korea. It is a major industry in Korea, accounting for 55% of the cultural contents export in 2011. E-sports, where gaming

More information

NEDERBOOMS Treebank Mining for Data- based Linguistics. Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde

NEDERBOOMS Treebank Mining for Data- based Linguistics. Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde NEDERBOOMS Treebank Mining for Data- based Linguistics Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde LOT Summer School - June, 2014 NEDERBOOMS Exploita)on of Dutch treebanks

More information

Texas Success Initiative (TSI) Assessment

Texas Success Initiative (TSI) Assessment Texas Success Initiative (TSI) Assessment Interpreting Your Score 1 Congratulations on taking the TSI Assessment! The TSI Assessment measures your strengths and weaknesses in mathematics and statistics,

More information

VIDEO CREATIVE IN A DIGITAL WORLD Digital analytics afternoon. Hugo.schurink@millwardbrown.com emmy.brand@millwardbrown.com

VIDEO CREATIVE IN A DIGITAL WORLD Digital analytics afternoon. Hugo.schurink@millwardbrown.com emmy.brand@millwardbrown.com VIDEO CREATIVE IN A DIGITAL WORLD Digital analytics afternoon Hugo.schurink@millwardbrown.com emmy.brand@millwardbrown.com AdReaction Video: 42 countries 13,000+ Multiscreen Users 2 3 Screentime is enormous

More information

Innoveren met Data. Created with open data : https://joinup.ec.europa.eu/community/ods/document/online-training-material. Dr.ir.

Innoveren met Data. Created with open data : https://joinup.ec.europa.eu/community/ods/document/online-training-material. Dr.ir. Innoveren met Data Created with open data : https://joinup.ec.europa.eu/community/ods/document/online-training-material Dr.ir. Erwin Folmer BIG DATA (GARTNER, JULY 2013) Erwin Folmer Pressure Cooker

More information

Outline. Social Media Data. What is social media. What is social media

Outline. Social Media Data. What is social media. What is social media Outline Social Media Data a new trend of corpus-based research? Why it is important The use of social media Collecting data from social media Big data What is new media? (http://en.wikipedia.org/wiki/

More information

ENTER A WORLD OF FASHION, LUXURY AND NEWS

ENTER A WORLD OF FASHION, LUXURY AND NEWS ENTER A WORLD OF FASHION, LUXURY AND NEWS FASHION TV HD LUXE.TV HD 24/7 high definition fashion channel Fashion TV H3D viewers can see most current and sexy fashion shows, top models, reports from the

More information

Video Transcription in MediaMosa

Video Transcription in MediaMosa Video Transcription in MediaMosa Proof of Concept Version 1.1 December 28, 2011 SURFnet/Kennisnet Innovatieprogramma Het SURFnet/ Kennisnet Innovatieprogramma wordt financieel mogelijk gemaakt door het

More information

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided Categories Categories According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided into 1 2 Categories those that belong to the Emperor embalmed

More information

Uw partner in system management oplossingen

Uw partner in system management oplossingen Uw partner in system management oplossingen User Centric IT Bring your Own - Corporate Owned Onderzoek Forrester Welke applicatie gebruik je het meest op mobiele devices? Email 76% SMS 67% IM / Chat 48%

More information