LANGUAGE CODING IN INFORMATION TECHNOLOGIES

Size: px
Start display at page:

Download "LANGUAGE CODING IN INFORMATION TECHNOLOGIES"

Transcription

1 LANGUAGE CODING IN INFORMATION TECHNOLOGIES TKE 2014: Language Codes at the Crossroads Peter Constable Microsoft Corporation / Unicode Consortium

2 Does industry need or care about ISO 639-3?

3 The main question that I have is whether language identification should be a task for ISO, the International Organization for Standardization ISO is basically an organization [f]or industry, not for science The reason why ISO got involved in language name issues in the first place is of course the economic significance of translation and localization, which is far greater than the relevance of distant stars for businesses. But does this mean that someone needs ISO s industry standard to identify little-known languages of small communities that are never or hardly used in writing and that are often in danger of extinction? Martin Haspelmath, Diversity Linguistics Comment, December 2013

4 Business has an interest in the stable identification of economically significant languages, for example for translation and computer localization, and this is why the ISO and standards were established in the first place. However, those standards are adequate for the needs of industry; business has no significant interest in the many small, unwritten and often endangered languages with no measurable economic impact. Kwamikagami (Wikipedia contributor), April 2014

5 Information technologies rely heavily on ISO 639, including ISO IETF BCP 47 is a key industry technology using ISO Many of linguists needs for language identification can be accommodated by the same BCP 47 mechanisms used in the IT industry

6 AGENDA Use of language identifiers in information technologies History: development of ISO and industry adoption via BCP 47 Which languages are significant? Overview of IETF BCP 47 (language tags) Utility of industry mechanisms for linguists

7 USE OF LANGUAGE IDENTIFIERS IN INFORMATION TECHNOLOGIES

8 USE OF LANGUAGE IDENTIFIERS Tagging content to declare the language of content Text, audio, video Tagging of software resources for language-specific processing Matching user language preferences with content Matching content with language-specific processes

9 USE OF LANGUAGE IDENTIFIERS Examples Display of content in my preferred language Web pages, videos, captions, etc. Display of application user interfaces in my preferred language Activating input methods for different languages Spell checking Text-to-speech (many others)

10 DEVELOPMENT OF ISO AND INDUSTRY ADOPTION VIA BCP 47

11 LANDSCAPE CIRCA 2000 Language documentation / applied linguistics Research, literature development in 1000s of languages Large language corpora Simons and Bird (2000), forming of OLAC

12 LANDSCAPE CIRCA 2000 Industry Limited locale identifier mechanisms Windows: numbers 512 maximum Mac: numbers 150 defined Internet: RFC 1766 based on ISO and ISO , e.g., en-us XML: using RFC 1766 ISO 639-1: 180 languages ISO 639-2: 350 languages

13 LANDSCAPE CIRCA 2000 Changing industry landscape Unicode Consortium mission: This Corporation s specific purpose shall be to enable people around the world to use computers in any language 1 Unicode 3.0, finally becoming mainstream in software Office 97, Windows 2000, Mac OS X, X E T E X, XML,.Net, Java, C++, ECMAScript, Pango Rapidly-growing interest in expanding language support Major vendors: We don t want to be a bottleneck for language communities!

14 LANDSCAPE CIRCA 2000 We need a comprehensive language coding standard! Industry support for development of ISO 639-3

15 DEVELOPMENT SINCE 2000 ISO : start of work 2007: published BCP : IETF RFC 3066 incorporation of ISO : IETF RFC 4646 enhancements to compatibility, stability, structure 2009: IETF RFC 5646 incorporation of ISO Widespread adoption of BCP 47, ISO across technologies, Web, and OS platforms

16 CURRENT INDUSTRY USE OF ISO Unicode CLDR 25: Used in Android, Mac OS, ios, Windows, Debian Linux, Apache, Data for 369 languages in ISO not in ISO / ISO Exemplar data for 600+ more General support for all of ISO Windows 8: Explicit use of 115 IDs in ISO not in ISO / ISO General support for all of ISO 639-3

17 WHICH LANGUAGES ARE SIGNIFICANT?

18 WHICH LANGUAGES ARE SIGNIFICANT FOR INDUSTRY? Extended Graded Intergenerational Disruption Scale See:

19 a 6b 7 8a 8b 9 10

20 Institutional Mass media / publishing Libraries Education Commerce, marketing Product localization, translation a 6b 7 8a 8b 9 10

21 Developing waning Limited-to-no institutional support, mass media, etc. Use of ICTs: End-user content Web, SMS, Some product localization Significant enhancement to language stabilization, vitality a 6b 7 8a 8b 9 10

22 Dying extinct Use of ICTs: Language documentation XML a 6b 7 8a 8b 9 10

23 WHICH LANGUAGES ARE SIGNIFICANT FOR INDUSTRY? Some will be more used and better supported by industry than others but all need and get some level of industry support Industry is creating technologies relevant to all languages!

24 BCP 47 OVERVIEW

25 BCP 47 IETF Best Current Practice specification Reference: History: 1995: RFC : RFC : RFC RFC : RFC RFC 4647 Designed to accommodate language variations Language, writing system, orthography, dialect, Extended concepts that have language as a core component

26 HANDLING VARIATIONS Start with IDs for discrete languages from ISO 639-1, ISO Using BCP 47, add qualifiers to language tags as needed Examples: pt-br = Portuguese as used in Brazil az-cyrl = Azerbaijani written in Cyrillic script ca-valencia = Valencian de-1996 = German using 1996 orthographic conventions en-latn-fonipa-scouse = Scouse dialect of English in IPA transcription

27 KEY COMPONENTS OF BCP 47 Tag syntax Subtag registry (maintained by IANA) Mechanism to register variant subtags Mechanism to register extensions

28 BCP 47 SYNTAX Language-Tag = langtag / privateuse langtag = language ; ISO 639 ("-" script)? ; ISO ("-" region)? ; ISO or UN M.49 ("-" variant)* ; registered ("-" extension)* ; registered RFC ("-" privateuse)? extension = privateuse = singleton ("-" alphanum{2,8})+ "x" ("-" alphanum{1,8})+

29 BCP 47 SYNTAX Examples: haw language pt-br language + ISO region es-419 language + UN M.49 region az-cyrl language + script ca-valencia language + variant pww-latn-fonipa language + script + variant x-foobar private use fil-x-foobar language + private use und-hebr-t-und-latn-m0-ungegn-1977 language + script + t extension

30 VARIANT SUBTAGS Registration requests can be submitted by anyone Reviewed for best practice Added to IANA Language Subtag Registry Process: see 64 variant subtags registered to date

31 VARIANT SUBTAGS Examples: Variant subtag aluku balanka itihasa vallader Meaning Aluku dialect of the "Busi Nenge Tongo" English-based Creole continuum in Eastern Suriname and Western French Guiana The Balanka dialect of Anii Epic Sanskrit Vallader idiom of Romansh 1959acad "Academic" ("governmental") variant of Belarusian as codified in nict baku1926 Late Middle French (to 1606 as in Jean Nicot, "Thresor de la langue francoyse", 1606) Unified Turkic Latin Alphabet (principles codified at the 1926 Turkological Conference in Baku)

32 VARIANT SUBTAGS Registration form example: LANGUAGE SUBTAG REGISTRATION FORM 1. Name of requester: Tomaž Erjavec 2. address of requester: tomaz.erjavec&ijs.si 3. Record Requested: Type: variant Subtag: metelko Description: Slovene in Metelko alphabet Prefix: sl Comments: The subtag represents the alphabet codified by Franc Serafin Metelko and used from 1825 to Intended meaning of the subtag: The subtag marks texts written in Slovene using the historical Metelko alphabet, which is distinguished from the contemporary norm by borrowing (and modifying) letters from Cyrillic. 5. Reference to published description of the language (book or article): Stabej, Marko. Franc Serafin Metelko in Metelčica. In (Janez Cvirn, ed.) Slovenska Kronika XIX. stoletja. (2001). Print. 6. Any other relevant information: The tag "sl-metelko" is relevant as a possible value of attribute to be used by language technology applications for transcribing and modernising such texts, e.g. for text search in cultural heritage digital libraries. E.g. the National and University Library of Slovenia has plans to digitise about 5,000 pages of books written in the Metelko alphabet.

33 EXTENSIONS For concepts that go beyond language but have language as a core component Created by IETF process Details of an extension can be owned by other authorities Existing extensions t Transformed content RFC 6497 Maintaining authority: Unicode Consortium u Unicode locale RFC 6067 Maintaining authority: Unicode Consortium

34 BCP 47 EXTENSIONS Example: t extension Transformed content Extension defined in RFC Unicode specification UTS #35 Example tag: und-hebr-t-und-latn-m0-ungegn-1977 Syntax: language + script + t extension Meaning: content in Hebrew script transformed from Latin script according to a UNGEGN 1977 transliteration specification

35 UTILITY OF INDUSTRY MECHANISMS FOR LINGUISTS

36 LINGUISTICS / LANGUAGE DOCUMENTATION Goals: Language development (literacy, lexicography, content development) Scholarly documentation Objects of scholarly investigation Languages, dialects Linguistic varieties / variety networks, languoids

37 HANDLING LINGUISTS NEEDS Use existing BCP 47 mechanisms Used in xml:lang Used in Dublin Core Metadata Element Set, Version Existing process to register variant subtags Existing subtags registered for different kinds of variants Dialect variants Pronunciation variants Historic variants Orthographic / spelling variants

38 HANDLING LINGUISTS NEEDS Linguists could create a new BCP 47 extension Extension could use glottocode or other domain-specific vocabulary Example (hypothetical): pww-l-gc-maes1238-sc-l2fsi2-sd-disarthr ISO 639-3: Northern Pwo Karen Extension key-value pairs: glottocode: Mae Sarieng variant speaker competence: L2 speaker, estimated FSI level 2 speech defect: symptoms of dysarthria Process: see

39 HANDLING LINGUISTS NEEDS What if ISO 639, BCP 47 aren t enough? May need properties that don t belong in language tags (e.g., speaker s social network) Create appropriate data schemas May need properties pertaining directly to language variation, but BCP 47 variants / extensions deemed not a good fit (e.g., tentative analysis, doesn t align to existing ISO categories) Create other metadata vocabularies and data schemas Trade-off: not supported in general industry specifications (e.g., xml:lang)

40 SUMMATION

41 SUMMATION Information technologies rely heavily on ISO 639, including ISO 639-3, most often via BCP 47 Many needs of linguists can be accommodated by existing BCP 47 mechanisms Additional needs of linguists might be accommodated by creation of a new BCP 47 extension Linguists should use ISO and BCP 47 whenever appropriate!

The Rise of Documentary Linguistics and a New Kind of Corpus

The Rise of Documentary Linguistics and a New Kind of Corpus The Rise of Documentary Linguistics and a New Kind of Corpus Gary F. Simons SIL International 5th National Natural Language Research Symposium De La Salle University, Manila, 25 Nov 2008 Milestones in

More information

Language Translation Services RFP Issued: January 1, 2015

Language Translation Services RFP Issued: January 1, 2015 Language Translation Services RFP Issued: January 1, 2015 The following are answers to questions Brand USA has received to the RFP for Language Translation Services. Thanks to everyone who submitted questions

More information

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease

More information

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata Standard for Information and Image Management Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata Association for Information and

More information

BACKGROUND. Namespace Declaration and Qualification

BACKGROUND. Namespace Declaration and Qualification LOGISTICS MANAGEMENT INSTITUTE Recommended XML Namespace for Government Organizations GS301L1/AUGUST 2003 By Jessica L. Glace and Mark R. Crawford INTRODUCTION The Extensible Markup Language (XML) is rapidly

More information

CoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne

CoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne CoLang 2014 Data Management and Archiving Course Session 2 Nick Thieberger University of Melbourne Quiz In a morning recording session you recorded two speakers, each telling a story, then recorded your

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows

More information

odt2braille brings Braille to your Office

odt2braille brings Braille to your Office odt2braille brings Braille to your Office Bert Frees, Christophe Strobbe & Jan Engelen* Katholieke Universiteit Leuven Kasteelpark Arenberg 10 3001 Heverlee-Leuven Belgium Abstract OpenOffice.org, the

More information

The IANA Functions. An Introduction to the Internet Assigned Numbers Authority (IANA) Functions

The IANA Functions. An Introduction to the Internet Assigned Numbers Authority (IANA) Functions The IANA Functions An Introduction to the Internet Assigned Numbers Authority (IANA) Functions Contents SECTION 1: INTRODUCTION 4 SECTION 2: POLICY, STAKEHOLDERS AND STEWARDSHIP IMPLEMENTATION 6 SECTION

More information

Computerized Language Analysis (CLAN) from The CHILDES Project

Computerized Language Analysis (CLAN) from The CHILDES Project Vol. 1, No. 1 (June 2007), pp. 107 112 http://nflrc.hawaii.edu/ldc/ Computerized Language Analysis (CLAN) from The CHILDES Project Reviewed by FELICITY MEAKINS, University of Melbourne CLAN is an annotation

More information

Sterling Web. Localization Guide. Release 9.0. March 2010

Sterling Web. Localization Guide. Release 9.0. March 2010 Sterling Web Localization Guide Release 9.0 March 2010 Copyright 2010 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the Sterling Web Documentation Library:

More information

Reading Competencies

Reading Competencies Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies

More information

Liblouis a universal solution for Braille transcription services

Liblouis a universal solution for Braille transcription services Liblouis a universal solution for Braille transcription services Christian Egli [email protected] 23 September 2009 Outline 1 Introduction Why are we here? 2 A universal solution Universal? 3 Liblouis

More information

VoiceXML Data Logging Overview

VoiceXML Data Logging Overview Data Logging Overview - Draft 0.3-20 August 2007 Page 1 Data Logging Overview Forum Tools Committee Draft 0.3-20 August 2007 Data Logging Overview - Draft 0.3-20 August 2007 Page 1 About the Forum: Founded

More information

How To Change Marc To A Bibbone Model

How To Change Marc To A Bibbone Model BIBFRAME and Cataloging: changing landscapes Sally McCallum [email protected] Library of Congress Frankfurt 13 August 2014 Outline Motivators Why move away from MARC New library cataloging norms Active modeling

More information

Open Vulnerability and Assessment Language (OVAL ) Validation Program Test Requirements (DRAFT)

Open Vulnerability and Assessment Language (OVAL ) Validation Program Test Requirements (DRAFT) NIST Interagency Report 7669(Draft) Open Vulnerability and Assessment Language (OVAL ) Validation Program Test Requirements (DRAFT) John Banghart Stephen Quinn David Waltermire NIST Interagency Report

More information

Heritage Voice: Program. Lenape Language Education Program of the Lenape Nation of Pennsylvania and Swarthmore College

Heritage Voice: Program. Lenape Language Education Program of the Lenape Nation of Pennsylvania and Swarthmore College Heritage Voice: Program Lenape Language Education Program of the Lenape Nation of Pennsylvania and Swarthmore College PO Box 451 Easton, PA 18044 www.lenapenation.org/lenapelanguage.html www.swarthmore.edu/socsci/linguistics/lenapelanguageresources

More information

Nefertari International Schools IBDP Candidate School Whole School Language Policy

Nefertari International Schools IBDP Candidate School Whole School Language Policy Nefertari International Schools IBDP Candidate School Whole School Language Policy Release Date: September 2014 To be Reviewed: September 2019 Introduction Before framing the language policy for the International

More information

Globalization and Localization

Globalization and Localization Globalization and Localization Presented by Paul Johnson Developer Division Microsoft Corporation Agenda Part I The Basics Background Defining the terms Basic approaches to localization of web sites Part

More information

Information and documentation The Dublin Core metadata element set

Information and documentation The Dublin Core metadata element set ISO TC 46/SC 4 N515 Date: 2003-02-26 ISO 15836:2003(E) ISO TC 46/SC 4 Secretariat: ANSI Information and documentation The Dublin Core metadata element set Information et documentation Éléments fondamentaux

More information

CatDV Pro Workgroup Serve r

CatDV Pro Workgroup Serve r Architectural Overview CatDV Pro Workgroup Server Square Box Systems Ltd May 2003 The CatDV Pro client application is a standalone desktop application, providing video logging and media cataloging capability

More information

Preservation Handbook

Preservation Handbook Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc

More information

White Paper. Translation Quality - Understanding factors and standards. Global Language Translations and Consulting, Inc. Author: James W.

White Paper. Translation Quality - Understanding factors and standards. Global Language Translations and Consulting, Inc. Author: James W. White Paper Translation Quality - Understanding factors and standards Global Language Translations and Consulting, Inc. Author: James W. Mentele 1 Copyright 2008, All rights reserved. Executive Summary

More information

How To Manage Your Digital Assets On A Computer Or Tablet Device

How To Manage Your Digital Assets On A Computer Or Tablet Device In This Presentation: What are DAMS? Terms Why use DAMS? DAMS vs. CMS How do DAMS work? Key functions of DAMS DAMS and records management DAMS and DIRKS Examples of DAMS Questions Resources What are DAMS?

More information

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania A Short Introduction to Transcribing with ELAN Ingrid Rosenfelder Linguistics Lab University of Pennsylvania January 2011 Contents 1 Source 2 2 Opening files for annotation 2 2.1 Starting a new transcription.....................

More information

Internationalizing JavaScript Applications Norbert Lindenberg. Norbert Lindenberg 2013. All rights reserved.

Internationalizing JavaScript Applications Norbert Lindenberg. Norbert Lindenberg 2013. All rights reserved. Internationalizing JavaScript Applications Norbert Lindenberg Norbert Lindenberg 2013. All rights reserved. Agenda Unicode support Collation Number and date/time formatting Localizable resources Message

More information

Extensible Markup Language (XML): Essentials for Climatologists

Extensible Markup Language (XML): Essentials for Climatologists Extensible Markup Language (XML): Essentials for Climatologists Alexander V. Besprozvannykh CCl OPAG 1 Implementation/Coordination Team The purpose of this material is to give basic knowledge about XML

More information

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business The future of International SEO The future of Search Engine Optimization (SEO) for International Business Whitepaper The World Wide Web is now allowing special characters in URLs which means crawlers now

More information

ECM Governance Policies

ECM Governance Policies ECM Governance Policies Metadata and Information Architecture Policy Document summary Effective date 13 June 2012 Last updated 17 November 2011 Policy owner Library Services, ICTS Approved by Council Reviewed

More information

Internet Structure and Organization

Internet Structure and Organization Internet Structure and Organization Resources management and allocation [email protected] Introduction What s the Internet? Why organizations / bodies are needed? Define protocol specifications Agree

More information

Year Abroad Project Handbook 2015

Year Abroad Project Handbook 2015 Year Abroad Project Handbook 2015 Year Abroad 2013-14 Part II 2014-15 TABLE OF CONTENTS 1 The MML Year Abroad Project page 2 What is required? The Dissertation page 2 The Translation Project page 2 The

More information

How To Teach Reading

How To Teach Reading Florida Reading Endorsement Alignment Matrix Competency 1 The * designates which of the reading endorsement competencies are specific to the competencies for English to Speakers of Languages (ESOL). The

More information

Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects

Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects David Graff, Mohamed Maamouri Linguistic Data Consortium University of Pennsylvania E-mail: [email protected], [email protected]

More information

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014 CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages Nicki Dell Spring 2014 What is a Programming Language? A set of symbols and associated tools that translate (if necessary) collections

More information

Data Portability: It is about the Data the Quality of the Data

Data Portability: It is about the Data the Quality of the Data Data Portability: It is about the Data the Quality of the Data ABSTRACT The life cycle of software applications used to capture and manage data is but a fraction of the life cycle of the data itself. The

More information

HP Service Manager Compatibility Matrix

HP Service Manager Compatibility Matrix HP Service Manager Compatibility Matrix Software Version 9.21 January 12, 2011 Click one of the following links to see more detailed information. Tier Definitions Servers Applications Support Windows Client

More information

Master of Arts in Linguistics Syllabus

Master of Arts in Linguistics Syllabus Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university

More information

Translating QueueMetrics into a new language

Translating QueueMetrics into a new language Translating QueueMetrics into a new language Translator s manual AUTORE: LOWAY RESEARCH VERSIONE: 1.3 DATA: NOV 11, 2006 STATO: Loway Research di Lorenzo Emilitri Via Fermi 5 21100 Varese Tel 0332 320550

More information

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Janne Bondi Johannessen, Signe Laake, Kristin Hagen, Øystein Alexander Vangsnes, Tor Anders Åfarli, Arne

More information

Chapter 2 Text Processing with the Command Line Interface

Chapter 2 Text Processing with the Command Line Interface Chapter 2 Text Processing with the Command Line Interface Abstract This chapter aims to help demystify the command line interface that is commonly used in UNIX and UNIX-like systems such as Linux and Mac

More information

Multi-lingual Cataloguing: culture, practice and systems

Multi-lingual Cataloguing: culture, practice and systems Multi-lingual Cataloguing: culture, practice and systems Cathie Jilovsky, Lamis Sukkar, Eva Varga CAVAL Collaborative Solutions Abstract: The provision of multi-lingual web services is dependent on the

More information

SignLEF: Sign Languages within the European Framework of Reference for Languages

SignLEF: Sign Languages within the European Framework of Reference for Languages SignLEF: Sign Languages within the European Framework of Reference for Languages Simone Greiner-Ogris, Franz Dotter Centre for Sign Language and Deaf Communication, Alpen Adria Universität Klagenfurt (Austria)

More information

Annotation in Language Documentation

Annotation in Language Documentation Annotation in Language Documentation Univ. Hamburg Workshop Annotation SEBASTIAN DRUDE 2015-10-29 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations

More information

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode Unicode Why Unicode? What is Unicode? Unicode Architecture Why Unicode? Pre-Unicode Standards

More information

Overview of admission requirements for the master s degree programs of the Faculty of Arts

Overview of admission requirements for the master s degree programs of the Faculty of Arts Overview of admission requirements for the master s degree programs of the Faculty of Arts Subjects Studies amounting to 78 credits Studies amounting to 42 credits Egyptology and Coptic Studies General

More information

Transcription Format

Transcription Format Representing Discourse Du Bois Transcription Format 1. Objective The purpose of this document is to describe the format to be used for producing and checking transcriptions in this course. 2. Conventions

More information

How To Teach English To Other People

How To Teach English To Other People TESOL / NCATE Program Standards STANDARDS FOR THE ACCREDIATION OF INITIAL PROGRAMS IN P 12 ESL TEACHER EDUCATION Prepared and Developed by the TESOL Task Force on ESL Standards for P 12 Teacher Education

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Designing Global Applications: Requirements and Challenges

Designing Global Applications: Requirements and Challenges Designing Global Applications: Requirements and Challenges Sourav Mazumder Abstract This paper explores various business drivers for globalization and examines the nature of globalization requirements

More information

St. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix

St. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix Course Credit In-service points St. Petersburg College RED 4335/Reading in the Content Area Florida Reading Endorsement Competencies 1 & 2 Reading Alignment Matrix Text Rule 6A 4.0292 Specialization Requirements

More information

Roselle Public School District Curriculum Framework 2011 (Preparing Students for the 21 st Century) Sixth Grade

Roselle Public School District Curriculum Framework 2011 (Preparing Students for the 21 st Century) Sixth Grade Content: English Language Arts Enduring Understandings 1. Oral discussion helps to build connections to others and create opportunities for learning. 2. Effective speakers adapt their style and content

More information

1 REVISOR 8710.4925. C. show verification of completing a Board of Teaching preparation program

1 REVISOR 8710.4925. C. show verification of completing a Board of Teaching preparation program 1 REVISOR 8710.4925 8710.4925 READING LEADER. Subpart 1. Scope of practice. A reading leader is authorized to facilitate and provide site-based or districtwide leadership for kindergarten through grade

More information

Challenges of Multilingualism and Possible Approach for Standardization of e-governance Solutions in India

Challenges of Multilingualism and Possible Approach for Standardization of e-governance Solutions in India Challenges of Multilingualism and Possible Approach for Standardization of e-governance Solutions in India Swaran Lata 1 * and Somnath Chandra 1 ABSTRACT In this paper we have addressed the major challenges

More information

ICT Project on Text Transcription of Technical Video Lectures and Creation of Video Searchable Index, Metadata and Online Quizzes

ICT Project on Text Transcription of Technical Video Lectures and Creation of Video Searchable Index, Metadata and Online Quizzes ICT Project on Text Transcription of Technical Video Lectures and Creation of Video Searchable Index, Metadata and Online Quizzes Status Report up to September 30, 2010 Project duration: April 2009 to

More information

Standards and Guidelines for. Information Technology. Infrastructure, Architecture, and Ongoing Operations

Standards and Guidelines for. Information Technology. Infrastructure, Architecture, and Ongoing Operations Standards and Guidelines for Information Technology Infrastructure, Architecture, and Ongoing Operations This document describes applicable standards and guidelines for the university's policy on Information

More information

SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE

SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE 595 SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE Nagaraj N Vaidya Francis Jayakanth Abstract Today 80 % of the content on the Web is in English, which is spoken

More information

A Sensible Approach to Asset Management

A Sensible Approach to Asset Management Introduction Most IT managers would agree, an effective asset management program is the key to successfully managing the IT enterprise. Whether it is measuring total cost of ownership or creating an enterprise-wide

More information

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim [email protected]

More information

Contents. BMC Atrium Core 7.6.00 Compatibility Matrix

Contents. BMC Atrium Core 7.6.00 Compatibility Matrix Contents INTRODUCTION... 2 Supported Configurations... 2 Known Issues... 2 Potential Issues... 2 Support Policy for later versions of vendor products released after Atrium Core 7.5.00... 2 BMC ATRIUM CMDB,

More information

The World Atlas of Language Structures & Follow-up notes

The World Atlas of Language Structures & Follow-up notes November 2007 Workshop on the Feasibility of a Web-based Database of the Syntactic Structures of the World s Languages The World Atlas of Language Structures & Follow-up notes Hans-Jörg Bibiko Max Planck

More information

SDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications

SDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications INSIGHT SDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications José Curto David Schubmehl IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200

More information

eb Service Oriented Architecture Catalog of Patterns

eb Service Oriented Architecture Catalog of Patterns 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 eb Service Oriented Architecture Catalog of Patterns Working Draft 001, 18 August 2004 Document identifier: tbd Location: http://www.oasis-open.org/committees/ebsoa/

More information

Library Technology Reports

Library Technology Reports Open Source Library Automation: Overview and Perspective A chapter from Library Technology Reports Expert Guides to Library Systems and Services by Marshall Breeding ALA TechSource purchases fund advocacy,

More information

WHAT S NEW IN RADAROPUS 1.39 AND 1.40 (PROGRAM UPDATES)

WHAT S NEW IN RADAROPUS 1.39 AND 1.40 (PROGRAM UPDATES) WHAT S NEW IN RADAROPUS 1.39-1.41 The successor of RadarOpus 1.38 comes in different steps due to technical reasons. Versions 1.39 (DVD release) is only released for Chinese customers and installs improvements

More information

READING SPECIALIST STANDARDS

READING SPECIALIST STANDARDS READING SPECIALIST STANDARDS Standard I. Standard II. Standard III. Standard IV. Components of Reading: The Reading Specialist applies knowledge of the interrelated components of reading across all developmental

More information

ETD 22.05.2003. The application of Persistent Identifiers as one approach to ensure long-term referencing of Online-Theses

ETD 22.05.2003. The application of Persistent Identifiers as one approach to ensure long-term referencing of Online-Theses ETD The application of Persistent Identifiers as one approach to ensure long-term referencing of Online-Theses Die Deutsche Bibliothek Table of Contents 1. Deficits of current addressing and identification

More information

Product Internationalization of a Document Management System

Product Internationalization of a Document Management System Case Study Product Internationalization of a ì THE CUSTOMER A US-based provider of proprietary Legal s and Archiving solutions, with a customizable document management framework. The customer s DMS was

More information

OJS @ Queen s Open Journal System (OJS) Business Case

OJS @ Queen s Open Journal System (OJS) Business Case The centrality of the library in the academy enables it to act as a primary catalyst for change in the scholarly communication domain. Libraries understand the culture of scholarship and are strategically

More information

Fulfilling World Language Requirements through Alternate Means

Fulfilling World Language Requirements through Alternate Means Fulfilling World Language Requirements through Alternate Means OUSD Board Policy 6146.1 allows students to meet graduation requirements through demonstration of proficiency. Both University of California

More information

REQUEST FOR PROPOSAL ACQUISITION & IMPLEMENTATION OF CENTRALIZED LOG MANAGEMENT SYSTEM

REQUEST FOR PROPOSAL ACQUISITION & IMPLEMENTATION OF CENTRALIZED LOG MANAGEMENT SYSTEM REQUEST FOR PROPOSAL ACQUISITION & IMPLEMENTATION OF CENTRALIZED LOG MANAGEMENT SYSTEM Proposal Release Date: AUGUST 20 th 2008 Proposal Due Date: SEPTEMBER 16 th 2008 TABLE OF CONTENTS 1 - INTRODUCTION...

More information

COMPANIES REGISTRY. Third Party Software Interface Specification. (Part 1 Overview)

COMPANIES REGISTRY. Third Party Software Interface Specification. (Part 1 Overview) COMPANIES REGISTRY Third Party Software Interface Specification () of Integrated Companies Registry Information System Version 1.3 March 2014 The Government of the Hong Kong Special Administrative Region

More information

1 Building a metadata schema where to start 1

1 Building a metadata schema where to start 1 1 Building a metadata schema where to start 1 1.1 Introduction Purpose Metadata has been defined as data describing the context, content and structure of records and their management through time 2. It

More information

Modern foreign languages

Modern foreign languages Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007

More information

Microsoft & Open Source Software

Microsoft & Open Source Software Microsoft & Introduction The seemingly never-ending conflict between open source software (OSS) and fixed source (proprietary) software continues to evolve in nuanced, complex directions, some predicted

More information

Higher Education Georgia State University (GSU) Arts & Sciences, Department of Modern & Classical Languages

Higher Education Georgia State University (GSU) Arts & Sciences, Department of Modern & Classical Languages 1 Higher Education Georgia State University (GSU) Arts & Sciences, Department of Modern & Classical Languages Address: 33 Gilmer St. SE Unit 8, Atlanta, GA 30303-3088 Contact: Oscar H. Moreno Lecturer

More information

Beginner s Android Development Tutorial!

Beginner s Android Development Tutorial! Beginner s Android Development Tutorial! Georgia Tech Research Network Operations Center (RNOC)! cic.gatech.edu Questions? Get in touch! piazza.com/gatech/spring2015/cic [email protected]

More information

Using Dublin Core for DISCOVER: a New Zealand visual art and music resource for schools

Using Dublin Core for DISCOVER: a New Zealand visual art and music resource for schools Proc. Int. Conf. on Dublin Core and Metadata for e-communities 2002: 251-255 Firenze University Press Using Dublin Core for DISCOVER: a New Zealand visual art and music resource for schools Karen Rollitt,

More information

Chapter 3: XML Namespaces

Chapter 3: XML Namespaces 3. XML Namespaces 3-1 Chapter 3: XML Namespaces References: Tim Bray, Dave Hollander, Andrew Layman: Namespaces in XML. W3C Recommendation, World Wide Web Consortium, Jan 14, 1999. [http://www.w3.org/tr/1999/rec-xml-names-19990114],

More information

3PlayMedia. Closed Captioning, Transcription, and Subtitling

3PlayMedia. Closed Captioning, Transcription, and Subtitling Closed Captioning, Transcription, and Subtitling 1 Introduction This guide shows you the basics of how to quickly create high quality transcripts, closed captions, translations, and interactive transcripts

More information

Closed Captioning Resources & Best Practices

Closed Captioning Resources & Best Practices Closed Captioning Resources & Best Practices 1. Media needs to be Closed Captioned Faculty and staff are responsible for captioning their own media. Captioning is necessary because one manner of accommodating

More information

Panel Decision. B12 of the.eu Dispute Resolution Rules (ADR Rules) Case No.: 01459 Time of Filing: 2006-05-22 11:44:40 Administrative Contact:

Panel Decision. B12 of the.eu Dispute Resolution Rules (ADR Rules) Case No.: 01459 Time of Filing: 2006-05-22 11:44:40 Administrative Contact: ADR Center for.eu attached to the Arbitration Court attached to the Economic Chamber of the Czech Republic and Agricultural Chamber of the Czech Republic (Czech Arbitration Court) Panel Decision B12 of

More information

PHONETIC TOOL FOR THE TUNISIAN ARABIC

PHONETIC TOOL FOR THE TUNISIAN ARABIC PHONETIC TOOL FOR THE TUNISIAN ARABIC Abir Masmoudi 1,2, Yannick Estève 1, Mariem Ellouze Khmekhem 2, Fethi Bougares 1, Lamia Hadrich Belguith 2 (1) LIUM, University of Maine, France (2) ANLP Research

More information