Building and Annotating a Corpus of German-Language Newsgroups
|
|
|
- Ada Wilkinson
- 9 years ago
- Views:
Transcription
1 Building and Annotating a Corpus of German-Language Newsgroups Jasmin Schröck Institut für Deutsche Sprache Mannheim [email protected] Harald Lüngen Institut für Deutsche Sprache Mannheim [email protected] Abstract Usenet is a large online resource containing user-generated messages (news articles) organised in discussion groups (newsgroups) which deal with a wide variety of different topics. We describe the download, conversion, and annotation of a comprehensive German news corpus for integration in DeReKo, the German Reference Corpus hosted at the Institut für Deutsche Sprache in Mannheim. 1 Introduction Usenet news are an instance of the genre of computer-mediated communication (CMC) which is of interest in many current research questions (cf. Beißwenger & Storrer 2008). Recent initiatives for the creation of CMC corpora have co-operated firstly in the DFG research network empirikom (Beißwenger 2012), and since 2013 within the TEI Special Interest Group on CMC, amongst other things to create a TEIbased standard for the encoding and annotation of CMC corpora for use in empirical linguistics research. Several CMC corpora based on versions of the encoding scheme provided by the TEI CMC SIG have been compiled so far, e.g. (German) chat and wikipedia discussions (Beißwenger et al. 2012; Margaretha & Lüngen 2014) and (French) corpora of various CMC subgenres in the project CoMeRe (Chanier et al. 2014). Currently the Dortmund Chatkorpus (Beißwenger 2013) is being prepared along the lines of the TEI CMC SIG for integration in CLARIN research infrastructures. Consequently, the aim of the work described in this paper was to close another gap by creating an edited Usenet corpus containing all newsgroups from the de. hierarchy and annotating relevant CMC phenomena according to the principles proposed by the TEI CMC SIG. The news corpus has been marked up for metadata and text structure according to I5, which is the TEI customization (Lüngen & Sperberg-McQueen 2012) used for the encoding of texts in DeReKo and which incorporates features of the TEI CMC SIG. 2 Usenet Usenet originated in 1979 and is based on the NNTP internet protocol (Horton & Adams 1987). The features of news messages include rich formatted metadata (the NNTP header) with fields for the sender, the posting date, the subject, the reply history of a message and other types of information. Header fields are obligatory or optional. In the message body, many textual features also found in s or letters prevail, such as salutations (openers and closers), postscripts, or signatures. Another characteristic feature is the highly recursive usage of quotations from previous articles, often introduced by an automatically generated line containing the address and name of the author and the posting date of the quoted article. Finally, the language used in news messages contains many familiar netspeak phenomena such as the use of emoticons, interjections, and inflectives (cf. Feldweg et al. 1995; Gausling 2005). A newsgroup works similar to a web discussion forum, one difference being that all newsgroups are organised in a universal, topic-based hierarchy. Newsgroups are stored world-wide on so-called news servers, and everyone is free to set up such a server. All news servers are regularly synchronised with each other so that they offer the same amount of news messages in each newsgroup sooner or later. Similarly, everyone is free to connect to a news server using news client software to subscribe to newsgroups to read 17 Proceedings of the 2nd Workshop on Natural Language Processing for Computer-Mediated Communication / Social Media pages 17 22, University of Duisburg-Essen, September 28, c 2015 GSCL
2 messages and to post one s own news messages to the server. Usenet communication has had its heydays in the 1990s, consequently it is a pre-web 2.0 form of CMC. But Usenet lives on, as a dedicated community has constantly been using it. 3 Related Work A previous German Usenet corpus initiative was undertaken in the ELWIS project (Corpus-based development of lexical knowledge bases), where a corpus of contemporary German was compiled, beginning in All messages of the year 1993, containing altogether 433,000 articles in 647 newsgroups, served as a base for the investigation of the language use in newsgroups (cf. Feldweg et al. 1995). More recently, the WestburyLab at the University of Alberta collected English-language Usenet data in the project A reduced redundancy Usenet Corpus from 2005 until 2011 (Shaoul & Westbury 2013). Their corpus covers 47,860 newsgroups containing more than seven billion words and seems to be the largest news archive ever prepared as a linguistic corpus. Another English-language corpus was created by Matt Mahoney in the project Usenet as a text corpus in 2000, containing 53,247 articles from 9,359 newsgroups (Mahoney 2000). Neither of the previous corpora seems to have been marked up using XML/TEI, nor have they been annotated for CMC phenomena. 4 Creation of the Corpus Following a common strategy in the construction of TEI corpora from text archives (cf. e.g. Fankhauser et al. 2013; Margaretha & Lüngen 2013), we divided the corpus creation into several steps: In a first step, all German-language Usenet data currently available on the newsserver news.individual.de was downloaded and converted into a well-formed XML version of the NNTP format (dubbed nntpxml) in a straightforward way. In a second stage, the nntpxml data was filtered and converted into the TEIbased I5 target format. In the third stage, we applied heuristics for the annotation of CMCphenomena typical of Usenet articles to the newly created corpus, creating I5 with annotations (see also Figure 1). Figure 1: Workflow 4.1 Download and conversion to nntpxml In the first stage, all currently available Usenet articles of all 379 German-language newsgroups from the de-hierarchy were downloaded from the newsserver news.individual.de (run by FU Berlin, with a retention time of 621 days) on 1 June 2015 using the Python client nntplib. The downloaded data was preprocessed by converting it to Unicode and to well-formed nntpxml, using the Python library lxml. For each newsgroup, a separate file was generated. The original Usenet structure was mostly preserved, only the header lines were ordered in obligatory, optional and others (see Horton & Adams 1987). 4.2 Conversion from nntpxml to I5 The generated nntpxml files were then converted to the TEI format I5 which is used for DeReKo. An I5 corpus file is structured according to the three levels corpus (<idscorpus>, the root element), document (<idsdoc>), and text (<idstext>). All news articles of one calendar year were stored in a separate <idsdoc> while each article was included in a separate <idstext> document. Note that this corpus structure differs from previous CMC corpora where one thread or logfile containing a set of postings usually corresponds to one corpus text (Beißwenger et al. 2012; Margaretha & Lüngen 2014; Chanier et al. 2014). With news articles (and similarly s), the messages come neither grouped in a selfcontained document (like e.g. a Wikipedia page), nor is news a synchronous type of communication like chat, hence we do not consider threads or logfiles as suitable corpus units for news. Each of the three levels received its own header containing the metadata that were appropriate and could be extracted from the messages. The I5 structure was created using python with lxml, and XSLT stylesheets. A major task was to identify TEI elements for the encoding of the metadata of the original header of each article. One issue in this area was how to represent the reply history of a message as contained in the NNTP References header 18
3 line. From this header field, news readers like Mozilla Thunderbird derive the threaded view of the messages. Since neither the TEI Guidelines, nor the TEI CMC SIG provides metadata elements for the reply history, we resorted to a recent proposal by the TEI Correspondence SIG (2015). This SIG develops, amongst other things, TEI elements for the encoding of correspondence-specific metadata applying to all kinds of correspondence such as letters, telegrams, diaries, s and blogs. Consequently, we added the elements <correspdesc> and <correspcontext> to I5. The latter serves the encoding of information about previous and following messages in a correspondence. 1 Furthermore, we adopted the suggestion by Beißwenger et al. (2012) to create a list of participants in the newsgroup (<listperson>) and a timeline (<timeline>), but stored them in separate files as a step in the anonymisation of the data. The list of persons contains the address for each participant and their name if available. Also, following Beißwenger et al. (2012), the text of an article was wrapped in a <posting> element whose refer to the corresponding ID in the list of persons and the timeline, respectively. 4.3 Annotation of CMC phenomena and quality assessment of the annotations For an annotation of CMC phenomena as introduced in Section 2, we developed several heuristics and implemented them in an XSLT 2.0 stylesheet, creating a regular expression for each phenomenon and tagging the matching strings with a suitable TEI element. The following CMC phenomena were annotated: quotations, as well as the lines introducing them, links to the World Wide Web, links to other newsgroups, salutations (openers and closers), postscripts, user signatures, and emoticons. Apart from these CMC categories, paragraphs were annotated. Table 1 shows the phenomena and the respective elements used for their annotation. 1 Apparently, these elements have been added to the official TEI Guidelines in the meantime, cf. TEI Consortium (2015). Table 1: Annotated phenomena, used elements and their source Phenomenon Link to www Link to newsgroup Source TEI TEI Example <ref type="www" target="url"> URL </ref> <ref type="newsgroup" target="de.rec.fahrrad"> de.rec.fahrrad </ref> Opener TEI "opener"> Hallo, </seg> Closer TEI "closer"> Ciao, NAME </seg> Postscript TEI "postscript"> P.S. dürften sich eigentlich links der durchgezogenen Linie in dieser Fahrradstraße noch Fahrräder aufhalten? </seg> Signature TEI <trailer> -- Life's a road, not a destination. </trailer> Emoticon Quotation, with or without introductory line Beißwenger et al. (2012) TEI <interactionterm> <emoticon> :-( </emoticon> </interactionterm> <cit type="replycit"> <bibl type= "introquote"> Am :33, schrieb NAME: </bibl> <quote> <p>die Zukunft ist da, seilzuglose Rennräder sind möglich geworden. </p> </quote> </cit> We assessed the quality of the CMC annotations by conducting a small evaluation of each annotated CMC feature on 200 articles from five newsgroups, which according to their topics seemed reasonably diverse: de.etc.sprache.deutsch (the German language), 19
4 de.rec.mampf (food/eating) de.comp.os.mswindows.misc (windows operating system), at.gesellschaft.politik (society and politics, Austria) and de.soc.senioren (senior citizens) (cf. Schröck 2015). For this test set, correct reference annotations were created manually by an expert familiar with the TEI elements used for the annotation of the CMC categories and their TEI (or SIG) definitions. The reference set eventually contained 3,291, the test set 3,438 annotation instances (TEI elements marking up CMC phenomena). Comparing the test set with the reference, the micro average precision over all eleven annotation categories was found to be 79%, and the micro average recall was 82%. The categories that were identified best by the regular expressions were signature and emoticon. The categories most difficult to identify were postscript and opener. However, these two categories, and also links to newsgroups, didn t occur very frequently in the test and reference sets, which were relatively small. Furthermore, the results for openers, closers and introduction lines of quotes, which often contain names, could be improved by using Named Entity Recognition. The results for all elements and the overall results are shown in Table 2. Table 2: Number of annotation instances for each element in reference set (# ref) and test set (# test) and micro-averaged precision (P), recall (R) and F-measure (F) I5 Tag # ref # test P R F <cit> <bibl> <quote> <p> <ref type= www > signature > <emoticon> <ref type= newsgroup > postscript > closer > opener > Σ = The Corpus Σ = 3438 Ø =.79 Ø =.82 Ø =.80 Download was carried out on 1 June 2015 and took 12 hours, using four threads on a linux machine with an AMD Opteron 8439 SE processor with 48 cores at 2.8GHz, and 256G RAM. The news server potentially contained 1,004,157 articles in 379 newsgroups in the de hierarchy; however, 62,878 articles were discarded because their X-No-Archive field was set to yes, and another 70,376 because their encoding could not be determined and hence not be converted to UTF- 8. The conversion-to-i5 phase took 2:20 hours, and the annotation phase took 54 hours. Four newsgroups contained no messages, and with one newsgroup, the annotation did not terminate. From the remaining 374 groups, 11 messages were discarded because they contained an error in the Date field. The resulting, annotated full corpus in I5 format contains 374 newsgroups comprising 870,892 news articles with million word tokens. It takes up 7.2G of disk space. The corpus contains messages posted between 24/9/2013 and 1/6/2015. The biggest newsgroup (929MB) is de.soc.politik.misc containing 117,950 messages (16.8 million word tokens). However, the size of the corpus will be further reduced in the deduplication and cleaning step. 6 Conclusion The news corpus described in this paper is currently being further anonymised, cleaned, and de-duplicated, mostly according to the principles described in Shaoul & Westbury (2013). The resulting version is scheduled to be included in the upcoming DeReKo release DeReKo-2015-II. However, the question of whether the corpus can be shared with the linguistic community remains to be solved. CMC texts, like all other texts, are subject to copyright, and in principle each author 20
5 of an article contained in the corpus would have to give his or her consent first. On the other hand we think that a news article is not the same as a text on the W3C, as someone who posts to a newsgroup is aware of the fact (in fact wants) that his/her message will be distributed to many servers all over the world in the first place. We are currently seeking legal advice in this matter in cooperation with the CLARIN-D curational project Chatkorpus2CLARIN. 2 Until further notice, the news corpus will be accessible from the premises of the IDS Mannheim only. We intend to update the corpus with new news articles regularly. We are also aiming at downloading from a news server with a longer retention time, though as far as we can see, longer retention times are only offered by commercial news servers. Reference Beißwenger, Michael (2012): Forschungsnotiz: Das Wissenschaftliche Netzwerk "Empirische Erforschung internetbasierter Kommunikation" (Empirikom). In: Zeitschrift für germanistische Linguistik 40 (3), pp Beißwenger, Michael (2013): Das Dortmunder Chat- Korpus. In: Zeitschrift für germanistische Linguistik 41 (1), pp Beißwenger, Michael; Storrer, Angelika (2008): Corpora of Computer-Mediated Communication In: Lüdeling, Anke; Kytö, Merja (eds.): Corpus Linguistics. An international Handbook. Vol. 1, Berlin: de Gruyter, pp Beißwenger, Michael; Ermakova, Maria; Geyken, Alexander; Lemnitzer, Lothar; Storrer, Angelika (2012): A TEI Schema for the Representation of Computer-mediated Communication. In: Journal of the Text Encoding Initiative [Online] 3. Beißwenger, Michael; Lemnitzer, Lothar (2013): Aufbau eines Referenzkorpus zur deutschsprachigen internetbasierten Kommunikation als Zusatzkomponente für die Korpora im Projekt Digitales Wörterbuch der deutschen Sprache (DWDS). In: Journal for Language Technology and Computational Linguistics (JLCL) 28 (2), Special issue on Webkorpora in Computerlinguistik und Sprachforschung. Chanier, Thierry; Poudat, Céline; Sagot, Benoît; Antoniadis, Georges; Wigham, Ciara R.; Hriba, Linda; Longhi, Julien; Seddah, Djamé (2014): The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres. In: Journal of 2 Language Technology and Computational Linguistics (JLCL) 29 (2), pp Feldweg, Helmut; Kibinger, Ralf; Thielen, Christine. (1995): Zum Sprachgebrauch in deutschen Newsgruppen. In: Schmitz, U. (ed.): Neue Medien. Osnabrücker Beiträge zur Sprachtheorie 50, Oldenburg: Red. OBST, pp Gausling, Timo (2005): Der Newsgroup-Beitrag - eine kommunikative Gattung? In: Studentische Arbeitspapiere zu Sprache und Interaktion (SASI) 4. Series Arbeitspapiere des Centrum Sprache und Interaktion der Westfälischen Wilhelms- Universität, available online at: (last visited ). Horbach, Andrea; Thater, Stefan; Steffen, Diana; Fischer, Peter M.; Witt, Andreas; Pinkal, Manfred (2015): Internet Corpora: A Challenge for Linguistic Processing. In: Datenbank-Spektrum 15 (1), pp Horton, M.; Adams, R. (1987): RFC-1036 Standard for Interchange of USENET Messages. Available online at: (last visited ). Lüngen, Harald; Sperberg-McQueen, C. M. (2012): A TEI P5 Document Grammar for the IDS Text Model. In: Journal of the Text Encoding Initiative [Online] 3. Mahoney, Matt (2000): Usenet as a text corpus. Florida Tech, CS Dept., available online at: mmahoney/dissertation/corpus.html (last visited ). Margaretha, Eliza; Lüngen, Harald (2014): Building Linguistic Corpora from Wikipedia Articles and Discussions. In: Journal of Language Technology and Computational Linguistics (JLCL) 29 (2), pp Schröck, Jasmin (2015): Erstellung eines deutschsprachigen Usenet-Newsgroup-Korpus und Annotation von Phänomenen internetbasierter Kommunikation. BA-Thesis, University Heidelberg. Shaoul, Cyrus; Westbury Chris (2013): A reduced redundancy USENET corpus ( ). Edmonton, AB: University of Alberta, available online at: westburylab/downl oads/usenetcorpus.download.html (last visited ). TEI Consortium (2015): TEI P5: Guidelines for Electronic Text Encoding and Interchange. Available online at: (last visited ). TEI Correspondence SIG (2015). Information and examples available online at: 21
6 and SIG/ (last visited ). 22
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus Michael Beißwenger, Eric Ehrhardt, Andrea Horbach, Harald Lüngen, Diana Steffen, Angelika Storrer
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus Michael Beißwenger 1, Eric Ehrhardt 2, Andrea Horbach 3, Harald Lüngen 4, Diana Steffen 3, Angelika
How To Write A German Reference Corpus Of Computer Mediated Communication
DeRiK: A German Reference Corpus of Computer-Mediated Communication Michael Beißwenger 1, Maria Ermakova 2, Alexander Geyken 2, Lothar Lemnitzer 2, Angelika Storrer 1 1 Department of German Language and
Breatling the Meaning of Tag Sets in CMC corpora
An extended tag set for annotating parts of speech in CMC corpora Thomas Bartz 1, Michael Beißwenger 1, Eric Ehrhardt 2, Angelika Storrer 2 1) 2) International Research Days: Social Media and CMC Corpora
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface of Corpus and Computational Linguistics
Volume 29 Number 2 2014 ISSN 2190-6858 Journal for Language Technology and Computational Linguistics Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface
An XML Based Data Exchange Model for Power System Studies
ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical
MILCA - DISTANCE EDUCATION IN COMPUTATIONAL LINGUISTICS
MILCA - DISTANCE EDUCATION IN COMPUTATIONAL LINGUISTICS Aljoscha Burchardt, Stephan Walter, Manfred Pinkal, Department of Computational Linguistics and Phonetics, Saarland University I. Introduction This
Data Models in Learning Analytics
Data Models in Learning Analytics Vlatko Lukarov, Dr. Mohamed Amine Chatti, Hendrik Thüs, Fatemeh Salehian Kia, Arham Muslim, Christoph Greven, Ulrik Schroeder Lehr- und Forschungsgebiet Informatik 9 RWTH
AIHA Tech Topic #12: Working With USENET Newsgroups
AIHA Tech Topic #12: Working With USENET Newsgroups Table of Contents: Introduction 1 Introduction 2 Usenet Structure and Main Concepts 2.1 How Usenet Works 2.2 News Server (News Host, NNTP Server) 2.3
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim [email protected]
How To Use Gfi Mailarchiver On A Pc Or Macbook With Gfi Email From A Windows 7.5 (Windows 7) On A Microsoft Mail Server On A Gfi Server On An Ipod Or Gfi.Org (
GFI MailArchiver for Exchange 4 Manual By GFI Software http://www.gfi.com Email: [email protected] Information in this document is subject to change without notice. Companies, names, and data used in examples
DFG form 12.181 03/15 page 1 of 8. for the Purchase of Licences funded by the DFG
form 12.181 03/15 page 1 of 8 Guidelines for the Purchase of Licences funded by the Within the framework of the Alliance Digital Information Initiative, licences for journals, databases and e-books are
Preservation Handbook
Preservation Handbook [Binary Text / Word Processor Documents] Author Rowan Wilson and Martin Wynne Version Draft V3 Date 22 / 08 / 05 Change History Revised by MW 22.8.05; 2.12.05; 7.3.06 Page 1 of 7
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Domains Help Documentation This document was auto-created from web content and is subject to change at any time. Copyright (c) 2016 SmarterTools Inc.
Help Documentation This document was auto-created from web content and is subject to change at any time. Copyright (c) 2016 SmarterTools Inc. Domains All Domains System administrators can use this section
Best Practices for Managing Your Public Web Space and Private Work Spaces
Best Practices for Managing Your Public Web Space and Private Work Spaces So You re an Administrator to a Committee, Round Table, System User Group or Task Force? This Guide will introduce you to best
Aligning Word Senses in GermaNet and the DWDS Dictionary of the German Language
Aligning Word Senses in GermaNet and the DWDS Dictionary of the German Language Verena Henrich, Erhard Hinrichs, Reinhild Barkey Department of Linguistics University of Tübingen, Germany {vhenrich,eh,rbarkey}@sfs.uni-tuebingen.de
Qlik REST Connector Installation and User Guide
Qlik REST Connector Installation and User Guide Qlik REST Connector Version 1.0 Newton, Massachusetts, November 2015 Authored by QlikTech International AB Copyright QlikTech International AB 2015, All
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Duplication in Corpora
Duplication in Corpora Nadjet Bouayad-Agha and Adam Kilgarriff Information Technology Research Institute University of Brighton Lewes Road Brighton BN2 4GJ, UK email: [email protected]
Mining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models
Dissertation (Ph.D. Thesis) An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models Christian Siefkes Disputationen: 16th February
Migrate from Exchange Public Folders to Business Productivity Online Standard Suite
Migrate from Exchange Public Folders to Business Productivity Online Standard Suite White Paper Microsoft Corporation Published: July 2009 Information in this document, including URL and other Internet
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Change Manager 5.0 Installation Guide
Change Manager 5.0 Installation Guide Copyright 1994-2008 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco, CA 94111 U.S.A. All rights reserved.
The Web Web page Links 16-3
Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic HTML documents Describe several specific HTML tags and their purposes 16-1 Chapter Goals
Internationalization Tag Set 1.0 A New Standard for Internationalization and Localization of XML
A New Standard for Internationalization and Localization of XML Felix Sasaki World Wide Web Consortium 1 San Jose, This presentation describes a new W3C Recommendation, the Internationalization Tag Set
PRE-WRITING TASKS. Writing an e-mail to a friend about your first days back at school
PRE-WRITING TASKS Writing an e-mail to a friend about your first days back at school Die folgenden Aufgaben und Materialien wurden von Lehrerinnen und Lehrern für Lehrkräfte der SEK I entwickelt und können
WebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen [email protected] Abstract This software
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
A TYPICAL DAY WITH OFFICE 365. Office 365 is so much more than just an online version of email and Office 2013.
A TYPICAL DAY WITH OFFICE 365 Office 365 is so much more than just an online version of email and Office 2013. While all the documentation and information focuses on the key features and benefits of Office
Deep-Secure Mail Guard Feature Guide
Deep-Secure Mail Guard Feature Guide The Deep-Secure Mail Guard provides a rich selection of message security functionality and content policy options to Simple Message Transfer Protocol (SMTP) and/or
Introduction to LAN/WAN. Application Layer (Part II)
Introduction to LAN/WAN Application Layer (Part II) Application Layer Topics Domain Name System (DNS) (7.1) Electronic Mail (Email) (7.2) World Wide Web (WWW) (7.3) Electronic Mail (Email) Mostly used
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
EUR-Lex 2012 Data Extraction using Web Services
DOCUMENT HISTORY DOCUMENT HISTORY Version Release Date Description 0.01 24/01/2013 Initial draft 0.02 01/02/2013 Review 1.00 07/08/2013 Version 1.00 -v1.00.doc Page 2 of 17 TABLE OF CONTENTS 1 Introduction...
Extensible Markup Language (XML): Essentials for Climatologists
Extensible Markup Language (XML): Essentials for Climatologists Alexander V. Besprozvannykh CCl OPAG 1 Implementation/Coordination Team The purpose of this material is to give basic knowledge about XML
XML Processing and Web Services. Chapter 17
XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing
LINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN
ELN INAUGURAL CONFERENCE, PRAGUE, 7-8 NOVEMBER 2015 EUROPEAN LITERACY NETWORK: RESEARCH AND APPLICATIONS Panel session Recent trends in Bachelor s dissertation/thesis research: foci, methods, approaches
Chapter 10 Printing, Exporting, and E-mailing
Getting Started Guide Chapter 10 Printing, Exporting, and E-mailing This PDF is designed to be read onscreen, two pages at a time. If you want to print a copy, your PDF viewer should have an option for
Checklist for a Data Management Plan draft
Checklist for a Data Management Plan draft The Consortium Partners involved in data creation and analysis are kindly asked to fill out the form in order to provide information for each datasets that will
Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF
Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Susanne Haaf & Bryan Jurish Deutsches Textarchiv 1. The Metadata Format CMDI Metadata? Metadata Format? and more Metadata? Metadata Format?
Preservation Handbook
Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc
IBM Unica emessage Version 8 Release 6 February 13, 2015. User's Guide
IBM Unica emessage Version 8 Release 6 February 13, 2015 User's Guide Note Before using this information and the product it supports, read the information in Notices on page 403. This edition applies to
INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3
INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3 Often the most compelling way to introduce yourself to a software product is to try deliver value as soon as possible. Simego DS3 is designed to get you
Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting
Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting S.N.CHEONG AZHAR K.M. M. HANMANDLU Faculty Of Engineering, Multimedia University, Jalan Multimedia,
What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project
Proceedings of elex 2011, pp. 203-208 What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Carolin Müller-Spitzer, Alexander Koplenig, Antje Töpel Institute
..., (Data Driven Learning).
Guideline for Pre-Service Teachers.... look see, (Data Driven Learning). Both teachers and learners can use corpus linguistics in various ways within the classroom. A teacher might collect a set of student
Discourse Markers in English Writing
Discourse Markers in English Writing Li FENG Abstract Many devices, such as reference, substitution, ellipsis, and discourse marker, contribute to a discourse s cohesion and coherence. This paper focuses
Microsoft Office Outlook 2010: Level 1
Microsoft Office Outlook 2010: Level 1 Course Specifications Course length: 8 hours Course Description Course Objective: You will use Outlook to compose and send email, schedule appointments and meetings,
System Requirements for Archiving Electronic Records PROS 99/007 Specification 1. Public Record Office Victoria
System Requirements for Archiving Electronic Records PROS 99/007 Specification 1 Public Record Office Victoria Version 1.0 April 2000 PROS 99/007 Specification 1: System Requirements for Archiving Electronic
Electronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
StreamServe Persuasion SP4 StreamServe Connect for SAP - Business Processes
StreamServe Persuasion SP4 StreamServe Connect for SAP - Business Processes User Guide Rev A StreamServe Persuasion SP4StreamServe Connect for SAP - Business Processes User Guide Rev A SAP, mysap.com,
Integration of Time Management in the Digital Factory
Integration of Time Management in the Digital Factory Ulf Eberhardt a,, Stefan Rulhoff b,1 and Dr. Josip Stjepandic c a Project Engineer, Daimler Trucks, Mannheim, Germany b Consultant, PROSTEP AG, Darmstadt
PCVITA Express Migrator for SharePoint(Exchange Public Folder) 2011. Table of Contents
Table of Contents Chapter-1 ------------------------------------------------------------- Page No (2) What is Express Migrator for Exchange Public Folder to SharePoint? Migration Supported The Prominent
Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development
Analysis of Spam Filter Methods on SMTP Servers Category: Trends in Anti-Spam Development Author André Tschentscher Address Fachhochschule Erfurt - University of Applied Sciences Applied Computer Science
VPOP3 Your email post office Getting Started Guide
VPOP3 Your email post office Getting Started Guide VPOP3 Getting Started Guide, version 2.1 1 Copyright Statement This manual is proprietary information of Paul Smith Computer Services and is not to be
ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004
ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004 By Aristomenis Macris (e-mail: [email protected]), University of
A Concept for an Electronic Magazine
TERENA-NORDUnet Networking Conference (TNNC) 1999 1 A Concept for an Electronic Magazine Alexander von Berg Helmut Pralle University of Hanover, Institute for Computer Networks and Distributed Systems
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Internet Protocol Address
SFWR 4C03: Computer Networks & Computer Security Jan 17-21, 2005 Lecturer: Kartik Krishnan Lecture 7-9 Internet Protocol Address Addressing is a critical component of the internet abstraction. To give
Designing and Implementing a Server Infrastructure MOC 20413
Designing and Implementing a Server Infrastructure MOC 20413 In dieser 5-tägigen Schulung erhalten Sie die Kenntnisse, welche benötigt werden, um eine physische und logische Windows Server 2012 Active
TZWorks Windows Event Log Viewer (evtx_view) Users Guide
TZWorks Windows Event Log Viewer (evtx_view) Users Guide Abstract evtx_view is a standalone, GUI tool used to extract and parse Event Logs and display their internals. The tool allows one to export all
Visendo Fax Server Integration With SharePoint Server
Energy reduction Visendo Fax Server Integration With SharePoint Server 5 reasons why to integrate Fax Servers in Content Management Servers Whitepaper Achieve more with less Increase Productivity Shortening
PCVITA Express Migrator for SharePoint (File System) 2011. Table of Contents
Table of Contents Chapter-1 ---------------------------------------------------------------------------- Page No (2) What is PCVITA Express Migrator for SharePoint (File System)? Migration Supported The
XML: extensible Markup Language. Anabel Fraga
XML: extensible Markup Language Anabel Fraga Table of Contents Historic Introduction XML vs. HTML XML Characteristics HTML Document XML Document XML General Rules Well Formed and Valid Documents Elements
WESTERNACHER OUTLOOK E-MAIL-MANAGER OPERATING MANUAL
TABLE OF CONTENTS 1 Summary 3 2 Software requirements 3 3 Installing the Outlook E-Mail Manager Client 3 3.1 Requirements 3 3.1.1 Installation for trial customers for cloud-based testing 3 3.1.2 Installing
Standards, Tools and Web 2.0
Standards, Tools and Web 2.0 Web Programming Uta Priss ZELL, Ostfalia University 2013 Web Programming Standards and Tools Slide 1/31 Outline Guidelines and Tests Logfile analysis W3C Standards Tools Web
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Turnitin Blackboard 9.0 Integration Instructor User Manual
Turnitin Blackboard 9.0 Integration Instructor User Manual Version: 2.1.3 Updated December 16, 2011 Copyright 1998 2011 iparadigms, LLC. All rights reserved. Turnitin Blackboard Learn Integration Manual:
CLOUD COMPUTING CONCEPTS FOR ACADEMIC COLLABORATION
Bulgarian Journal of Science and Education Policy (BJSEP), Volume 7, Number 1, 2013 CLOUD COMPUTING CONCEPTS FOR ACADEMIC COLLABORATION Khayrazad Kari JABBOUR Lebanese University, LEBANON Abstract. The
Digital Commons Journal Guide: How to Manage, Peer Review, and Publish Submissions to Your Journal
bepress Digital Commons Digital Commons Reference Material and User Guides 6-2016 Digital Commons Journal Guide: How to Manage, Peer Review, and Publish Submissions to Your Journal bepress Follow this
11 ways to migrate Lotus Notes applications to SharePoint and Office 365
11 ways to migrate Lotus Notes applications to SharePoint and Office 365 Written By Steve Walch, Senior Product Manager, Dell, Inc. Abstract Migrating your Lotus Notes applications to Microsoft SharePoint
ACR Triad Site Server Click Once Software System
ACR Triad Site Server Click Once Software System Version 2.5 20 October 2008 User s Guide American College of Radiology 2007 All rights reserved. CONTENTS INTRODUCTION...3 ABOUT TRIAD...3 DEFINITIONS...4
Initial Setup of Mozilla Thunderbird with IMAP for Windows 7
Initial Setup of Mozilla Thunderbird Concept This document describes the procedures for setting up the Mozilla Thunderbird email client to download messages from Google Mail using Internet Message Access
