Cultural Trends and language change
|
|
- Jacob Walters
- 8 years ago
- Views:
Transcription
1 Cultural Trends and language change Gosse Bouma Information Science University of Groningen NHL 2015/03 Gosse Bouma 1/25
2 Popularity of Wolf in English books Gosse Bouma 2/25
3 Google Books Ngrams Digital Library Google Books is a project where books are scanned and turned into text using OCR (Optical Character Recognition) and made searchable with Google search. currently approx. 20 M books, mostly English, mostly since 1800 Google Books Ngrams: Valuable resource for cultural and linguistic studies Gosse Bouma 3/25
4 Google Books Google Books & ngrams viewer The Google Labs N-gram Viewer is the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ( United States of America ) in books over time. You ll be searching through over 5.2 million books: 4% of all books ever published! A-users-guide-to-culturomics Gosse Bouma 4/25
5 Popularity of various *isms Gosse Bouma 5/25
6 Google Books Ngrams Viewer Jean-Baptiste Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books. Science books.html Gosse Bouma 6/25
7 Google Books Ngrams Viewer Examples freedom, liberty vampire, werewolf, zombie computer, phone, radio, gun radio, television, internet best, beft Gosse Bouma 7/25
8 Google Books 2.0 The USA is/are Do people think of the United States as a singular or plural entity? And is this constant over time? Gosse Bouma 8/25
9 Google Books 2.0 The USA is/are Do people think of the United States as a singular or plural entity? And is this constant over time? We can answer this question by computing how often the United states is followed by examples of frequent plural verbs and divide the result by the overall frequency of the United States, We can do the same for the United states followed by a frequent singular verb. Gosse Bouma 8/25
10 Google Books 2.0 The USA is/are Do people think of the United States as a singular or plural entity? And is this constant over time? We can answer this question by computing how often the United states is followed by examples of frequent plural verbs and divide the result by the overall frequency of the United States, We can do the same for the United states followed by a frequent singular verb. The result is shown here: Notice the use of arithmetic to sum and divide frequencies. Gosse Bouma 8/25
11 Determiners and Country Names "The Ukraine" is incorrect both grammatically and politically, says Oksana Kyzyma of the Embassy of Ukraine in London. "Ukraine is both the conventional short and long name of the country," she says. "This name is stated in the Ukrainian Declaration of Independence and Constitution." Gosse Bouma 9/25
12 Google Books 2.0 Spelling,phrasal expressions recognise, recognize (try) gone missing (try) graduated college vs graduated from college Search using part of speech experience_noun vs. experience_verb (try) to always _VERB_, never, quickly, boldly Arithmetic (The United States have + The United States are)/the United States (The United States has + The United States is)/the United States try Gosse Bouma 10/25
13 Literary Applications First Person Fiction Ted Underwood suggests there is a sharp drop in the percentage of first person narratives around 1800 Can we investigate this using corpus linguistics? we-dont-already-know-the-broad-outlines-of-literary-history Gosse Bouma 11/25
14 Literary Applications Given novels that are clearly written with a 1st or 3rd person narrator Which words do occur significantly more often in 1st or 3rd person novels? Gosse Bouma 12/25
15 Literary Applications Given a large collection of fiction books: Does the ratio between 1st and 3rd person pronouns change over time? Gosse Bouma 13/25
16 Using Syntax Dependency Relations What are frequent direct objects of drink? drink => *_NOUN What things are magnificent? *_NOUN_ => magnificent Gosse Bouma 14/25
17 Google Ngrams Google BOOKS ngram viewer uses books Google NGRAMS is Web data Gosse Bouma 15/25
18 Dutch Twitter Corpus Since 2011 the Information Science Department of the University of Groningen has been collecting Dutch language tweets. The goal is to collect a representative sample of all tweets posted in Dutch. We estimate that our method captures approximately 40-60% of the relevant tweets. Gosse Bouma 16/25
19 Dutch Twitter Corpus Since 2011 the Information Science Department of the University of Groningen has been collecting Dutch language tweets. The goal is to collect a representative sample of all tweets posted in Dutch. We estimate that our method captures approximately 40-60% of the relevant tweets. RieksOsinga #CTAboutaleb op Mooie en inspirerende woorden. AHPOIESZ Vandaag ons winnend concept gepresenteerd aan College van is er een probleem met de mail? Krijg namelijk een 500 interland server Error. Gelukkig! ICT geeft ook aan dat er geen storing, dat scheelt;-) Fijne zondag nog! NHL_Hogeschool Klaar voor collegereeks Met twee bedrijven Gosse Bouma 16/25
20 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en Gosse Bouma 17/25
21 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en trigram count perc wel eens voorkomen wel is voorkomen Gosse Bouma 17/25
22 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en trigram count perc wel eens voorkomen wel is voorkomen wel eens gebeuren 2, wel is gebeuren Gosse Bouma 17/25
23 Spelling Variation How often is eens written as is? Dit kan uiteraard wel is voorkomen in de statistiek. Source Dutch Twitter Ngram counts, Query wel [is,eens] %en trigram count perc wel eens voorkomen wel is voorkomen wel eens gebeuren 2, wel is gebeuren wel eens zien 34, wel is zien 10, Gosse Bouma 17/25
24 Spelling Variation zeςma Zeς ma is a discourse marker of Arabic etymology that is used in North Africa as well as in the French and Dutch varieties spoken by the North African diaspora in Europe...On the internet, users either omit ς or use another character, e.g. the digit 3. (Bouwmans, 2003) Source Dutch Twitter Ngram counts, Query ze%ma Gosse Bouma 18/25
25 Spelling Variation zeςma Zeς ma is a discourse marker of Arabic etymology that is used in North Africa as well as in the French and Dutch varieties spoken by the North African diaspora in Europe...On the internet, users either omit ς or use another character, e.g. the digit 3. (Bouwmans, 2003) Source Dutch Twitter Ngram counts, Query ze%ma hits word hits word 76,218 zehma 2,845 Zegma 45,561 ze3ma 2,149 ze3hma 29,944 zegma 1,707 zemma 15,058 Zehma 1,568 ZEHMA 8,448 Ze3ma 1,553 zema (and at least 20 other spelling variants) Gosse Bouma 18/25
26 Language Change: Popularity of der (her/there) ik ga der geld geven voor der verjaardag Mag je der vaseline opdoen? Dutch Twitter Ngram counts 2014 Gosse Bouma 19/25
27 Ngram statistics Why use ngram counts? Given enough data, ngram frequencies are often sufficient to study variation and trends Dutch Twitter Corpus # (Million) tweets 2,500 tokens 28,000 unigrams bigrams trigrams grams grams Gosse Bouma 20/25
28 Comparable Tools and Resources Twitter viewers (Univ Groningen, twiqs.nl) : links to actual tweets, trends, metadata slow for large periods and/or frequent ngrams Google Web 1T 5-Gram Database for European languages: ngram counts for 133 billion words of Dutch webtext Regex search, collocations: Corpus Frequency counts Keuleers et al (2010), word frequencies based on Dutch subtitles... Rovereto Twitter n-gram corpus with demographic metadata Herdagdelen (2013) : a Twitter-based dataset using n-grams, thereby overcoming the limitations on the redistribution of raw tweets n-gram counts for 75 million English tweets With gender-of-author and time-of-posting Gosse Bouma 21/25
29 Twitter Ngrams Web Interface Raw ngram counts ( ) Limited regex support, export results as csv, collocations, associations Run your own experiment : Download ngrams data Evert (2010), Google Web 1T 5-Grams Made Easy (but not for the computer) Gosse Bouma 22/25
30 Twitter Ngrams Web Interface - Trends Relative frequencies per month For ngrams occurring at least once in each month Using sqllite + Google Tables Gosse Bouma 23/25
31 Twitter vs Google Web Ngrams een meisje/liedje/... die/dat noun Twitter %die web %die ratio meisje liedje kind type bedrijf boek nummer geld filmpje ding Twitter: , Web Ngrams: 2008 Gosse Bouma 24/25
32 Enjoy! Gosse Bouma 25/25
PoliticalMashup. Make implicit structure and information explicit. Content
1 2 Content Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Overview project Zooming in on one cultural heritage dataset A few
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationNLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015
NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 Starting a Python and an NLTK Session Open a Python 2.7 IDLE (Python GUI) window or a Python interpreter
More informationSentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies
Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts
More informationEr is door mij gebruik gemaakt van dia s uit presentaties van o.a. Anastasios Kesidis, CIL, Athene Griekenland, en Asaf Tzadok, IBM Haifa Research Lab
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Er is door mij gebruik gemaakt van dia s uit presentaties
More informationTechnology Scouting Video Transcription
Project: Video for End-users Technology Scouting Video Transcription User stories Version: 1.0 Date: March 5, 2010 SURFnet / Kennisnet Innovatieprogramma 2010 Video Transcription: User stories 2 Introduction
More informationSAND: Relation between the Database and Printed Maps
SAND: Relation between the Database and Printed Maps Erik Tjong Kim Sang Meertens Institute erik.tjong.kim.sang@meertens.knaw.nl May 16, 2014 1 Introduction SAND, the Syntactic Atlas of the Dutch Dialects,
More informationTimeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?
Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]
More informationCourse description Course title: Dutch Language I: Introduction Course code: EN-IN-DLID Domein: Bewegen & Educatie > Education Objectives
Course description Course title: Dutch Language I: Introduction Course code: EN-IN-DLID Domein: Bewegen & Educatie > Education Objectives Understanding basic vocabulary: words (Dutch to English); Use of
More informationProject 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker
CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation
More informationUnit: Fever, Fire and Fashion Term: Spring 1 Year: 5
Unit: Fever, Fire and Fashion Term: Spring 1 Year: 5 English Fever, Fire and Fashion Unit Summary In this historical Unit pupils learn about everyday life in London during the 17 th Century. Frost fairs,
More informationThe information in this report is confidential. So keep this report in a safe place!
Bram Voorbeeld About this Bridge 360 report 2 CONTENT About this Bridge 360 report... 2 Introduction to the Bridge 360... 3 About the Bridge 360 Profile...4 Bridge Behaviour Profile-Directing...6 Bridge
More informationMAYORGAME (BURGEMEESTERGAME)
GATE Pilot Safety MAYORGAME (BURGEMEESTERGAME) Twan Boerenkamp Who is it about? Local council Beleidsteam = GBT or Regional Beleidsteam = RBT Mayor = Chairman Advisors now = Voorlichting? Official context
More informationWhitepaper. Leveraging Social Media Analytics for Competitive Advantage
Whitepaper Leveraging Social Media Analytics for Competitive Advantage May 2012 Overview - Social Media and Vertica From the Internet s earliest days computer scientists and programmers have worked to
More informationHow To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme)
The STEVIN IRME Project Jan Odijk STEVIN Midterm Workshop Rotterdam, June 27, 2008 IRME Identification and lexical Representation of Multiword Expressions (MWEs) Participants: Uil-OTS, Utrecht Nicole Grégoire,
More informationThe acquisition of grammatical gender in bilingual child acquisition of Dutch (by older Moroccan and Turkish children)
The acquisition of grammatical gender in bilingual child acquisition of Dutch (by older Moroccan and Turkish children) The definite determiner, attributive adjective and relative pronoun Leonie Cornips,
More informationFlattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
More informationFinding Syntactic Characteristics of Surinamese Dutch
Finding Syntactic Characteristics of Surinamese Dutch Erik Tjong Kim Sang Meertens Institute erikt(at)xs4all.nl June 13, 2014 1 Introduction Surinamese Dutch is a variant of Dutch spoken in Suriname, a
More informationSpecial Interest Group Oracle WebCenter
Special Interest Group Oracle WebCenter Eric Bos Oracle ECM Consultant 28 Oktober 2013 1 Oracle WebCenter Capture 1. Webcenter Capture vs OFR (Perceptive IDC) 2. WebCenter Capture 3. Workspaces en andere
More informationANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
More informationExample-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde
Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde LREC 2012, Istanbul May 25, 2012 NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics September
More informationDigital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
More informationSubmission guidelines for authors and editors
Submission guidelines for authors and editors For the benefit of production efficiency and the production of texts of the highest quality and consistency, we urge you to follow the enclosed submission
More informationMapping linguistic phenomena on Twitter and other big data sources. Gabriel Doyle UC San Diego 2014 LSA Annual Meeting
Mapping linguistic phenomena on Twitter and other big data sources Gabriel Doyle UC San Diego 2014 LSA Annual Meeting Big data most major corpora are hundreds of millions of words at most Twitter users
More informationLinguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10
Linguistic Research with CLARIN Jan Odijk MA Rotation Utrecht, 2015-11-10 1 Overview Introduction Search in Corpora and Lexicons Search in PoS-tagged Corpus Search for grammatical relations Search for
More informationCONTENT / ACTIVITY CAN DO PAGE LEVEL GRAMMAR
Speakout Starter Speakout CEF ALTE UCLES IELTS TOEIC TOEFL ibt PTE Starter - - 0-245 9-18 Elementary /A2 1 KET 3.0 246-500 19-29 1 Pre-intermediate A2/B1 2 PET 4.0 500-650 30-52 2 Intermediate B1+/B2 3
More informationLASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH
LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH Gertjan van Noord Deliverable 3-4: Report Annotation of Lassy Small 1 1 Background Lassy Small is the Lassy corpus in which the syntactic annotations
More informationThe WITCHCRAFT Project: A Progress Report
The WITCHCRAFT Project: A Progress Report Frans Wiering IMS Study Group Meeting, Zürich, 10 July 2007 Talk outline CATCH programme WITCHCRAFT project aim and team partners and their contribution results
More informationResearch Report. Ingelien Poutsma Marnienke van der Maal Sabina Idler
Research Report Ingelien Poutsma Marnienke van der Maal Sabina Idler Research report ABSTRACT This research investigates what the ideal bank for adolescents (10 16 years) looks like. The research was initiated
More informationAnnotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
More information6 TWITTER ANALYTICS TOOLS. SOCIAL e MEDIA AMPLIFIED
6 TWITTER ANALYTICS TOOLS SOCIAL e MEDIA AMPLIFIED 2 WHY USE TWITTER ANALYTICS TOOLS? Monitor and analysing Twitter projects are key components of Twitter campaigns. They improve efficiency and results.
More informationUtrecht Linguistic Database. Computational Tools for Linguistic Data March 15, 2002. Rapid Application Development
Utrecht Linguistic Database Computational Tools for Linguistic Data March 15, 2002 Maaike Schoorlemmer Lennart Herlaar Harmen van der Iest Martin Everaert Alexis Dimitriadis Peter Ackema 1 Introduction
More informationIndex. 1. Case background 2. What did we test? 3. The results! 4. About Online Dialogue. How the usability of the internal search module
internal search optimization with instant search How the usability of the internal search module lifted the conversion rate of this audience with 49% Index 1. Case background 2. What did we test? 3. The
More informationReal-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
More informationGoing Paperless The Utah Experience. Mike Pecorelli Project Manager Utah DEQ
Going Paperless The Utah Experience Mike Pecorelli Project Manager Utah DEQ Topic Overview Three Key Topics Interactive Map GIS Tool Electronic Document Management System Database Interactive Map
More informationAcquiring grammatical gender in northern and southern Dutch. Jan Klom, Gunther De Vogelaer
Acquiring grammatical gender in northern and southern Acquring grammatical gender in southern and northern 2 Research questions How does variation relate to change? (transmission in Labov 2007 variation
More informationA chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
More informationWorkflow Solutions for Very Large Workspaces
Workflow Solutions for Very Large Workspaces February 3, 2016 - Version 9 & 9.1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
More informationCambridge Primary English as a Second Language Curriculum Framework
Cambridge Primary English as a Second Language Curriculum Framework Contents Introduction Stage 1...2 Stage 2...5 Stage 3...8 Stage 4... 11 Stage 5...14 Stage 6... 17 Welcome to the Cambridge Primary English
More informationKids College Computer Game Programming Exploring Small Basic and Procedural Programming
Kids College Computer Game Programming Exploring Small Basic and Procedural Programming According to Microsoft, Small Basic is a programming language developed by Microsoft, focused at making programming
More informationGrammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP
More informationIP-NBM. Copyright Capgemini 2012. All Rights Reserved
IP-NBM 1 De bescheidenheid van een schaker 2 Maar wat betekent dat nu 3 De drie elementen richting onsterfelijkheid Genomics Artifical Intelligence (nano)robotics 4 De impact van automatisering en robotisering
More informationBuurten van gemeente Groningen
Page 1 of 7 Buurten van gemeente Groningen Shapefile Tags Buurtindeling Groningen Summary buurtindeling van de gemeente Groningen, buurten zijn samengesteld uit subbuurten Description Het bestand buurtindeling.shp
More informationHow To Test A Website On A Web Browser
user checks! improve your design significantly" Workshop by Userneeds - Anouschka Scholten Assisted by ArjanneAnouk Interact Arjanne de Wolf AmsterdamUX Meet up - June 3, 2015 Make people s lives better.
More informationInterpreting Web Analytics Data
Interpreting Web Analytics Data Whitepaper 8650 Commerce Park Place, Suite G Indianapolis, Indiana 46268 (317) 875-0910 info@pentera.com www.pentera.com Interpreting Web Analytics Data At some point in
More informationSoftware product management. Inge van de Weerd
Software product management Inge van de Weerd Hoe ben je naar dit college gekomen? A. Lopend B. Met de fiets C. Met het openbaar vervoer D. Met de auto E. Anders Agenda Software Product Management Requirements
More informationLong, often quite boring, notes of meetings
Long, often quite boring, notes of meetings 1 Long, often quite boring, notes of meetings www.polidocs.nl Maarten Marx Universiteit van Amsterdam February 2009 Long, often quite boring, notes of meetings
More informationCLARIN project DiscAn :
CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora Ted Sanders Kirsten Vis Utrecht Institute of Linguistics Utrecht University Daan Broeder TLA Max-Planck Institute
More informationAre you ready for more efficient and effective ways to manage discovery?
LexisNexis Early Data Analyzer + LAW PreDiscovery + Concordance Software Are you ready for more efficient and effective ways to manage discovery? Did you know that all-in-one solutions often omit robust
More informationMining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
More informationCOOLS COOLS. Cools is nominated for the Brains Award! www.brainseindhoven.nl/nl/top_10/&id=507. www.cools-tools.nl. Coen Danckmer Voordouw
Name Nationality Department Email Address Website Coen Danckmer Voordouw Dutch / Nederlands Man and Activity info@danckmer.nl www.danckmer.nl Project: Image: Photographer: Other images: COOLS CoenDVoordouw
More informationEXAMPLES OF HOW I USED A CORPUS IN WORDSMITH TOOLS TO TACKLE TRANSLATION PROBLEMS IN A TEXT ENTITLED PERFIL BIOFÍSICO FETAL
EXAMPLES OF HOW I USED A CORPUS IN WORDSMITH TOOLS TO TACKLE TRANSLATION PROBLEMS IN A TEXT ENTITLED PERFIL BIOFÍSICO FETAL I built my corpus by downloading texts from the MEDLINE database. I found English
More informationCHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14
CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based
More informationComparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch.
Comparing constructicons: A cluster analysis of the causative constructions with doen in Netherlandic and Belgian Dutch Natalia Levshina Outline 1. Dutch causative Cx with doen 2. Data and method 3. Quantitative
More informationENGLISH LANGUAGE - SCHEMES OF WORK. For Children Aged 8 to 12
1 ENGLISH LANGUAGE - SCHEMES OF WORK For Children Aged 8 to 12 English Language Lessons Structure Time Approx. 90 minutes 1. Remind class of last topic area explored and relate to current topic. 2. Discuss
More informationPICCL: Philosophical Integrator of Computational and Corpus Libraries
1 PICCL: Philosophical Integrator of Computational and Corpus Libraries Martin Reynaert 12, Maarten van Gompel 1, Ko van der Sloot 1 and Antal van den Bosch 1 Center for Language Studies - Radboud University
More informationNatural answer presentation. through revision. of syntactic patterns
University of Twente Faculty: Electrical Engineering, Mathematics & Computer Science Department: Computer Science Group: Language, Knowledge & Interaction Natural answer presentation through revision of
More informationMorphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning
Morphology Morphology is the study of word formation, of the structure of words. Some observations about words and their structure: 1. some words can be divided into parts which still have meaning 2. many
More informationThe New Forest Small School
The New Forest Small School Spanish For Children Aged 11 to 16 OCR GCSE in Spanish J732 AIMS AND OBJECTIVES To provide: A meaningful and enjoyable educational experience Known and achievable but challenging
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationOpportunities in the South Korean cheese market. Kansendossier Zuid-Korea
Opportunities in the South Korean cheese market Kansendossier Zuid-Korea Inhoud 1 Opportunities in the South Korean cheese market 3 1.1 1.2 Fast growing market 3 Premiumization 4 1.3 High potentials in
More informationFROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA
Kalev H. Leetaru Yahoo! Fellow in Residence Georgetown University kalev.leetaru5@gmail.com http://www.kalevleetaru.com FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA AUDIENCE QUESTION Have you
More informationHow To Analyse The Diffusion Patterns Of A Lexical Innovation In Twitter
GOOD MORNING TWEETHEARTS! : THE DIFFUSION OF A LEXICAL INNOVATION IN TWITTER REBECCA MAYBAUM (University of Haifa) Abstract The paper analyses the diffusion patterns of a community-specific lexical innovation,
More information1. Dimensional Data Design - Data Mart Life Cycle
1. Dimensional Data Design - Data Mart Life Cycle 1.1. Introduction A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople
More informationWEB OF SCIENCE CORE COLLECTION
What is Web of Science Core Collection? Search over 55 million records from the top journals, conference proceedings, and books in the sciences, social sciences, and arts and humanities to find the high
More informationScopus. Quick Reference Guide
Scopus Quick Reference Guide Quick Reference Guide An eye on global research. Scopus is the largest abstract and citation database of peer-reviewed literature, with bibliometrics tools to track, analyze
More informationREST web services. Representational State Transfer Author: Nemanja Kojic
REST web services Representational State Transfer Author: Nemanja Kojic What is REST? Representational State Transfer (ReST) Relies on stateless, client-server, cacheable communication protocol It is NOT
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationEnglish. Universidad Virtual. Curso de sensibilización a la PAEP (Prueba de Admisión a Estudios de Posgrado) Parts of Speech. Nouns.
English Parts of speech Parts of Speech There are eight parts of speech. Here are some of their highlights. Nouns Pronouns Adjectives Articles Verbs Adverbs Prepositions Conjunctions Click on any of the
More informationLANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5
Page 1 of 57 Grade 3 Reading Literary Text Principles of Reading (P) Standard 1: Demonstrate understanding of the organization and basic features of print. Standard 2: Demonstrate understanding of spoken
More informationCAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
More informationAlbert Pye and Ravensmere Schools Grammar Curriculum
Albert Pye and Ravensmere Schools Grammar Curriculum Introduction The aim of our schools own grammar curriculum is to ensure that all relevant grammar content is introduced within the primary years in
More informationWRITING FOR THE WEB. Lynn Villeneuve lynn@astrolabewebsites.ca
. WRITING FOR THE WEB Lynn Villeneuve lynn@astrolabewebsites.ca Adopting a specialized writing style for the web is important for reasons such as readability, search engine optimization and accessibility.
More informationEnglish Appendix 2: Vocabulary, grammar and punctuation
English Appendix 2: Vocabulary, grammar and punctuation The grammar of our first language is learnt naturally and implicitly through interactions with other speakers and from reading. Explicit knowledge
More informationWikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
More informationWHITEPAPER. Text Analytics Beginner s Guide
WHITEPAPER Text Analytics Beginner s Guide What is Text Analytics? Text Analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content
More informationIntroduction to the Database
Introduction to the Database There are now eight PDF documents that describe the CHILDES database. They are all available at http://childes.psy.cmu.edu/data/manual/ The eight guides are: 1. Intro: This
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationModern foreign languages
Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007
More informationCLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise
CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise 5 APR 2011 1 2005... Advanced Analytics Harnessing Data for the Warfighter I2E GIG Brigade Combat Team Data Silos DCGS LandWarNet
More informationTHE EMOTIONAL VALUE OF PAID FOR MAGAZINES. Intomart GfK 2013 Emotionele Waarde Betaald vs. Gratis Tijdschrift April 2013 1
THE EMOTIONAL VALUE OF PAID FOR MAGAZINES Intomart GfK 2013 Emotionele Waarde Betaald vs. Gratis Tijdschrift April 2013 1 CONTENT 1. CONCLUSIONS 2. RESULTS Reading behaviour Appreciation Engagement Advertising
More informationGrade 1 LA. 1. 1. 1. 1. Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27
Grade 1 LA. 1. 1. 1. 1 Subject Grade Strand Standard Benchmark Florida K-12 Reading and Language Arts Standards 27 Grade 1: Reading Process Concepts of Print Standard: The student demonstrates knowledge
More informationMeasure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by
1 Measure Social Media like a Pro: Social Media Analytics Uncovered # SOCIAL MEDIA LIKE # SHARE Powered by 2 Social media analytics were a big deal in 2013, but this year they are set to be even more crucial.
More information10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words
Standard 3: Writing Process 3.1: Prewrite 58-69% 10.LA.3.1.2 Generate a main idea or thesis appropriate to a type of writing. (753.02.b) Items may include a specified purpose, audience, and writing outline.
More informationSum of all paintings opening slide Introduce myself. Nlwp, Commons, Wikidata, GLAMwiki, bots, Wiki Loves Monuments, uploads, Based on Wikimania 2015
Sum of all paintings opening slide Introduce myself. Nlwp, Commons, Wikidata, GLAMwiki, bots, Wiki Loves Monuments, uploads, Based on Wikimania 2015 presentation. Not a lot of overlap in people! 1 Galleries,
More informationNAAR NEDERLAND HANDLEIDING
NAAR NEDERLAND HANDLEIDING www.naarnederland.nl 1. Introduction As of 15 March 2006, certain foreign nationals wishing to settle in the Netherlands for a prolonged period who require a provisional residence
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationCOURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE
SPAN 100/101 ELEMENTARY SPANISH COURSE OBJECTIVES This Spanish course pays equal attention to developing all four language skills (listening, speaking, reading, and writing), with a special emphasis on
More informationA comparative analysis of the language used on labels of Champagne and Sparkling Water bottles.
This research essay focuses on the language used on labels of five Champagne and five The two products are related, both being sparkling beverages, but also have obvious differences, primarily that one
More informationOpportunity Report on Korean gaming Kansendossier Korea
Opportunity Report on Korean gaming Kansendossier Korea Game is not a new field in Korea. It is a major industry in Korea, accounting for 55% of the cultural contents export in 2011. E-sports, where gaming
More informationNEDERBOOMS Treebank Mining for Data- based Linguistics. Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde
NEDERBOOMS Treebank Mining for Data- based Linguistics Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde LOT Summer School - June, 2014 NEDERBOOMS Exploita)on of Dutch treebanks
More informationTexas Success Initiative (TSI) Assessment
Texas Success Initiative (TSI) Assessment Interpreting Your Score 1 Congratulations on taking the TSI Assessment! The TSI Assessment measures your strengths and weaknesses in mathematics and statistics,
More informationVIDEO CREATIVE IN A DIGITAL WORLD Digital analytics afternoon. Hugo.schurink@millwardbrown.com emmy.brand@millwardbrown.com
VIDEO CREATIVE IN A DIGITAL WORLD Digital analytics afternoon Hugo.schurink@millwardbrown.com emmy.brand@millwardbrown.com AdReaction Video: 42 countries 13,000+ Multiscreen Users 2 3 Screentime is enormous
More informationInnoveren met Data. Created with open data : https://joinup.ec.europa.eu/community/ods/document/online-training-material. Dr.ir.
Innoveren met Data Created with open data : https://joinup.ec.europa.eu/community/ods/document/online-training-material Dr.ir. Erwin Folmer BIG DATA (GARTNER, JULY 2013) Erwin Folmer Pressure Cooker
More informationOutline. Social Media Data. What is social media. What is social media
Outline Social Media Data a new trend of corpus-based research? Why it is important The use of social media Collecting data from social media Big data What is new media? (http://en.wikipedia.org/wiki/
More informationENTER A WORLD OF FASHION, LUXURY AND NEWS
ENTER A WORLD OF FASHION, LUXURY AND NEWS FASHION TV HD LUXE.TV HD 24/7 high definition fashion channel Fashion TV H3D viewers can see most current and sexy fashion shows, top models, reports from the
More informationVideo Transcription in MediaMosa
Video Transcription in MediaMosa Proof of Concept Version 1.1 December 28, 2011 SURFnet/Kennisnet Innovatieprogramma Het SURFnet/ Kennisnet Innovatieprogramma wordt financieel mogelijk gemaakt door het
More informationAccording to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided
Categories Categories According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided into 1 2 Categories those that belong to the Emperor embalmed
More informationUw partner in system management oplossingen
Uw partner in system management oplossingen User Centric IT Bring your Own - Corporate Owned Onderzoek Forrester Welke applicatie gebruik je het meest op mobiele devices? Email 76% SMS 67% IM / Chat 48%
More information