Proposal to Encode the Mark's Chapter Glyph in the Unicode Standard

Similar documents
L2/ Abstract Introduction

Whisky Emoji Submission

ISO/IEC JTC1 SC2/WG2 N4399

ISO/IEC JTC1/SC2/WG2 N

The Unicode Standard Version 8.0 Core Specification

When older typesetting methods gave

JURASSIC EMOJI. This proposal details a set of Jurassic emoji to fix this situation once and for all.

Rendering/Layout Engine for Complex script. Pema Geyleg

Preservation Handbook

Guidelines for Writing System Support

Frequently Asked Questions on character sets and languages in MT and MX free format fields

The Unicode Standard Version 8.0 Core Specification

EURESCOM - P923 (Babelweb) PIR.3.1

ISO/IEC JTC1/SC2/WG2 N 2342

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.

Managing Variability in Software Architectures 1 Felix Bachmann*

Internationalized Domain Names -

Encoding script-specific writing rules based on the Unicode character set

THE CONVERSION OF A LINEAR TEXT TO HYPERTEXT

AutoCAD LT Product Portfolio Rebate Promotion Frequently Asked Questions ANZ Final (7 May 2013)

Regular Expressions and Automata using Haskell

Bitrix Site Manager 4.0. Quick Start Guide to Newsletters and Subscriptions

Kazuraki : Under The Hood

Easy Bangla Typing for MS-Word!

WORKING DRAFT. ISO/IEC International Standard International Standard ISO/IEC st Edition + Amd1

7-Bit coded Character Set

Embedded Special Characters Kiran Karidi, Mahipal Vanam, and Sridhar Dodlapati

RESEARCH DEGREE POLICY DOCUMENTS RESEARCH DEGREES: SUBMISSION, PRESENTATION, CONSULTATION AND BORROWING OF THESES

Unicode in Mobile Phones

HKSCS-2004 Support for Windows Platform

.ASIA CJK (Chinese Japanese Korean) IDN Policies

SAPScript. A Standard Text is a like our normal documents. In Standard Text, you can create standard documents like letters, articles etc

Information Technology Topic Maps Part 2: Data Model

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows

MANUSCRIPT TITLE (CENTERED, 14 POINT BOLD, SANS SERIF FONT, MAJUSCULE)

XML Character Encoding and Decoding

Authority file comparison rules Introduction

Reserved and Restricted Domain Names Policy

Scientific Graphing in Excel 2010

Getting Started Guide

Preparing graphics for IOP journals

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

Project Completion Report (PCR) User Guide

Symbols in subject lines. An in-depth look at symbols

Intro to Ancient Writing Systems

Typesetting Tamil Using Ω/ℵ

Program Budget Presentations

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

TypeTool 3. basic font editor for fast and easy font creation, conversion and modification User s manual for windows

Michigan State University FORMATTING GUIDE

Qatar University Office of Graduate Studies GRADUATE ACADEMIC MANUAL. 2. Procedure for Thesis/Dissertation Proposal, Comprehensive Exam, and Defense

DIFFICULTIES AND SOME PROBLEMS IN TRANSLATING LEGAL DOCUMENTS

Single and Multiple-Case Study Designs IS493

webmethods Certificate Toolkit

Digital Commons Journal Guide: How to Manage, Peer Review, and Publish Submissions to Your Journal

SRF enews Subscription - User's Guide

Manage. Help Documentation. This document was auto-created from web content and is subject to change at any time. Copyright (c) 2016 SmarterTools Inc.

Chapter 4: Computer Codes

Internationalizing JavaScript Applications Norbert Lindenberg. Norbert Lindenberg All rights reserved.

1. Basic encoding principles

Print Server Upgrade, Migration, and Interoperability

How to Copyright Your Book

Search Interface to a Mayan Glyph Database based on Visual Characteristics *

The manuscript should be submitted in both electronic form (a floppy disk, zip disk, or CD-ROM) and hard copy (two identical printouts).

Properties of sequences Since a sequence is a special kind of function it has analogous properties to functions:

TG TRANSITIONAL GUIDELINES FOR ISO/IEC :2015, ISO 9001:2015 and ISO 14001:2015 CERTIFICATION BODIES

Expository Reading and Writing By Grade Level

A LEVEL ECONOMICS. ECON1/Unit 1 Markets and Market Failure Mark scheme June Version 0.1 Final

This project is an opportunity to create a unique logo for a fictional

GUIDE TO PROPER USE A.M. BEST COMPANY

Nick Norelli Rightly Dividing the Word of Truth New Jersey

White Paper. Translation Quality - Understanding factors and standards. Global Language Translations and Consulting, Inc. Author: James W.

Computer Applications

COMPANY FORMATION IN TURKEY INTRODUCTION

Fonts and (illegal) software

Microsoft Word 2010 Prepared by Computing Services at the Eastman School of Music July 2010

Web Forms for Marketers 2.3 for Sitecore CMS 6.5 and

RGB Workflow Key Communication Points. Journals today are published in two primary forms: the traditional printed journal and the

SAS/GRAPH 9.2 ODS Graphics Editor. User s Guide

UML basics. Part II: The activity diagram. The activity diagram's purpose. by Donald Bell IBM Global Services

(1) latex + dvipdfm (which is a DVI to PDF translator) or (2) pdflatex (a version of LaTeX that generates PDF output instead of DVI).

INTERNATIONALIZATION AND LOCALIZATION AFTER SYSTEM DEVELOPMENT: A PRACTICAL CASE 1 Jesús Cardeñosa, Carolina Gallardo, Álvaro Martín

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

CONDITIONS FOR MAKING PAYMENT TRANSACTIONS

Graphic Standards Manual

For All Psychology Courses STUDENT PACKET FOR PORTFOLIO DEVELOPMENT. Dr. Rosalyn M. King, Professor

Last Updated: June 2013

Word Processing programs and their uses

Instructions to Candidates for the Examination of Research Degrees (including the Postgraduate Diploma by Research and Training)


SKA DOCUMENT MANAGEMENT PLAN

CONSOLIDATED VERSION IEC Medical device software Software life cycle processes. colour inside. Edition

24. PARAPHRASING COMPLEX STATEMENTS

CAD Standards. Design Guidelines and Standards. June Office of the University Architect

What s New in QuarkXPress 8

Each figure of a manuscript should be submitted as a single file.

)454 % 4HE INTERNATIONAL TELECOMMUNICATION CHARGE CARD

Transcription:

POMAR PROJECT Proposal to Encode the Mark's Chapter Glyph in the Unicode Standard Aleksandr Andreev, Yuri Shardt, Nikita Simmons

1 1. Introduction The symbols of the Russian Orthodox Typikon have already been proposed for inclusion in the Unicode standard (see (Shardt & Andreev, 2009) [n3772]). However, there remains one final glyph, the inclusion of which is necessary to properly typeset mediaeval and modern Slavonic liturgical texts within the framework of the Unicode Standard. This glyph is often referred to as Mark's Chapter Glyph. In Orthodox service books, Mark's Chapters are comments to difficult sections in the Typikon of the Lavra of St. Sabbas that were first compiled in the tenth century. These comments are attributed to a certain Monk Mark of the Lavra, possibly Bishop Mark of Hydruntum (Mansvetov, 1885, p. 219ff). These comments received a final revision when the Sabbaite Typikon was adopted by the Russian Orthodox Church in the fourteenth century. In their Russian version, they are indicated by a marginal glyph consisting of a stylized М (Cyrillic Capital Em) and, often, other elements of the name Mark (in Cyrillic: Марко). In the Russian Orthodox tradition, a total of three different forms can be found. The first form, which will be referred to as Type I, is shown in Figure 1 and Figure 2. This form dates from before the liturgical and orthographic reforms of Patriarch Nikon and is often seen in the Oko Tserkovnoye (Typikon) and the Lenten Triodion from this era. At present, Type I forms are occasionally still be used by Old Believers, who are those Orthodox that reject the reforms of Patriarch Nikon. It can be seen that Type I forms consistently shows various combinations of Cyrillic М, р, and к. The last two letters can be located above or below the Cyrillic М. Often, this glyph is in red type. The second form, which will be referred to as Type II, is the Mark s Chapter Glyph that was standardised by reforms of Patriarch Nikon. A common representation of the Type II form is shown in Figure 3. It can be seen that the symbol now consists solely of М and р and can occasionally be found in red. Finally, there exist various variant forms, which will be referred to as Type III, for example, that shown in Figure 4, which encloses the complete name in a box.

2 Figure 1: Mark's Chapter Glyphs in the 1640 Oko Tserkovnoye published in Moscow. Figure 2: Examples of Mark's Chapter Glyphs in the 1650 Lenten Triodion published in Moscow Figure 3: Mark's Chapter Glyph in the 1986 Typikon published by the Moscow Patriarchate Figure 4: Mark's Chapter Glyph in the 1893 Menaion published by the Kievan Lavra of the Caves From the above figures, it is obvious that the form of the Mark's Chapter Glyph varies substantially between texts and time periods. While all glyphs have the Cyrillic letter capital Em, the other letters do not have fixed forms or positions. Although some of the Type I variants could be considered as containing combining (superscripted) Cyrillic letters ka and er, other Type I variants contain both superscripted and subscripted Cyrillic letters ka and er. On the other hand, Type II variants contain mostly a Cyrillic letter Capital Em with a combining Cyrillic letter er. Thus, it can be concluded that there many different forms for the Mark s Chapter Glyph. It should be noted that while the form of Mark's Chapter Glyph varies substantially across texts, its function remains identical in all of the sources: to denote explanations given by

3 Monk Mark on difficult sections of the Typikon. 2. Mark's Chapter Glyph in Existing Slavonic Standards The Ponomar Project (http://www.ponomar.net/) is pioneering the rendering, storage, and display of Slavonic-language liturgical texts in Unicode. Previous methods for encoding Church Slavonic include the Unified Church Slavonic (UCS), which uses the Windows-1251 codepage and assigns to it different values, and the Hyperinvariant Presentation (HIP) formats, which is a mark-up language that allows the required Slavonic characters to be entered using a Windows- 1251 codepage. In the UCS-8, which is the most recent version of UCS, there does not exist a unified approach to encoding the Mark s Chapter Glyph. This could be attributed to an oversight on part of the authors of this standard. In the HIP format, the Mark's Chapter Glyph has been encoded as the unique command sequence <М\р> and distinguishes it clearly from the unique command М\р for a Cyrillic capital em with a superscripted Cyrillic er. In order to achieve backwards compatibility with this format, it would be advisable to include Mark s Chapter Glyph in Unicode. 3. Existing Characters in Unicode Similar characters have already been encoded within the Unicode standard. However, their use is not a viable alternative. Perhaps the closest analogue is the Coptic Symbol Mi Ro (U+2CE5). However, the use of this symbol for the Mark's Chapter Glyph is not appropriate given the vastly distinct typographic and linguistic usages of the two characters 1. 4. Justification for Inclusion of Mark s Chapter Glyph One approach to displaying the Mark s Chapter Glyph, especially in its Type II variant would be to use the Unicode sequence U+041C U+2DEC (Cyrillic capital em and combining Cyrillic er). However, this approach is problematic, as it would conflict with the ubiquitous abbreviation им къ ( say name here ), which uses an м without converting it into the Mark s Chapter Glyph form. It should be noted that both capital and lowercase versions of this 1 Not to mention that in academic contexts, one may also wish to include both Coptic and Slavonic texts.

4 abbreviation can be found. Finally, it can be noted that м of say name here and the Mark's Chapter Glyph have completely different functions and appearances. This implies that Mark s Chapter Glyph cannot be effectively rendered using a substitution table (such as for example GSUB in OpenType). This fact is noted by the HIPS standard which assigns different sequences of characters to represent the two entities. This strongly suggests that a separate codepoint should be created for Mark s Chapter Glyph. As well, as the above figures show, this proposed sequence only describes Type II variants of this glyph. It does not reflect the forms found in older forms, especially the myriad Type I forms. In order to properly encode these forms, using composite glyphs, there would be a need to introduce subscript combining Cyrillic characters to the Unicode standard. Finally, the glyph has an absolutely unique function and its inclusion in Unicode would greatly facilitate the storage, search, and editing of Slavonic-language liturgical texts. 5. Summary In summary, the inclusion of the Mark's Chapter Glyph in the Miscellaneous Symbols block is proposed, following the previously proposed Typikon symbols. The proposed codepoint, representation, and name of the glyph are given in Table 1. Table 1: Proposed Position and Representation of the Mark's Chapter Glypha Proposed Codepoint Representation Proposed Name U+1F545 Typikon Symbol Mark's Chapter 6. Bibliography Mansvetov, I. (1885). Церковный Устав (Типик) и его образование и судьба в Греческой и Русской Церкви. [The Church Ustav (Typikon) and its formation and usage in the Greek and Russian churches]. Moscow, Russian Empire. Shardt, Y., & Andreev, A. (2009). Proposal to Encode the Typikon Symbols in Unicode (L2/09-310). Proposal.

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. Please read Principles and Procedures Document (P & P) from http://www.dkuug.dk/jtc1/sc2/wg2/docs/principles.html for guidelines and details before filling this form. Please ensure you are using the latest Form from http://www.dkuug.dk/jtc1/sc2/wg2/docs/summaryform.html. See also http://www.dkuug.dk/jtc1/sc2/wg2/docs/roadmaps.html for latest Roadmaps. A. Administrative 1. Title: Proposal to Encode the Mark's Chapter Glyph in the Unicode Standard 2. Requester's name: Aleksandr Andreev, Yuri Shardt, Nikita Simmons, 3. Requester type (Member body/liaison/individual contribution): Individual Contribution 4. Submission date: October 3, 2010 5. Requester's reference (if applicable): 6. Choose one of the following: This is a complete proposal: (or) More information will be provided later: B. Technical General 1. Choose one of the following: a. This proposal is for a new script (set of characters): Proposed name of script: b. The proposal is for addition of character(s) to an existing block: Name of the existing block: U+ 1F54x 2. Number of characters in proposal: 1 3. Proposed category (select one from below - see section 2.2 of P&P document): A-Contemporary B.1-Specialized (small collection) Yes B.2-Specialized (large collection) C-Major extinct D-Attested extinct E-Minor extinct F-Archaic Hieroglyphic or Ideographic G-Obscure or questionable usage symbols 4. Is a repertoire including character names provided? a. If, are the names in accordance with the character naming guidelines in Annex L of P&P document? b. Are the character shapes attached in a legible form suitable for review? 5. Who will provide the appropriate computerized font (ordered preference: True Type, or PostScript format) for publishing the standard? Yuri Shardt If available now, identify source(s) for the font (include address, e-mail, ftp-site, etc.) and indicate the tools used: Hirmos Ponomar v.6 (contact Yuri Shardt at yuri.shardt@ualberta.ca for the font) 6. References: a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided? b. Are published examples of use (such as samples from newspapers, magazines, or other sources) of proposed characters attached? 7. Special encoding issues: Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information)? 8. Additional Information: Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script that will assist in correct understanding of and correct linguistic processing of the proposed character(s) or script. Examples of such properties are: Casing information, Numeric information, Currency information, Display behaviour information such as line breaks, widths etc., Combining behaviour, Spacing behaviour, Directional behaviour, Default Collation behaviour, relevance in Mark Up contexts, Compatibility equivalence and other Unicode normalization related information. See the Unicode standard at http://www.unicode.org for such information on other scripts. Also see http://www.unicode.org/public/unidata/ucd.html and associated Unicode Technical Reports for information needed for consideration by the Unicode Technical Committee for inclusion in the Unicode Standard. 1 Form number: N3152-F (Original 1994-10-14; Revised 1995-01, 1995-04, 1996-04, 1996-08, 1999-03, 2001-05, 2001-09, 2003-11, 2005-01, 2005-09, 2005-10, 2007-03, 2008-05)

C. Technical - Justification 1. Has this proposal for addition of character(s) been submitted before? If explain 2. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other experts, etc.)? If, with whom? academics; publishers of service books If, available relevant documents: 3. Information on the user community for the proposed characters (for example: size, demographics, information technology use, or publishing use) is included? >300 million Reference: 4. The context of use for the proposed characters (type of use; common or rare) common Reference: 5. Are the proposed characters in current use by the user community? If, where? Reference: In typesetting religious service books 6. After giving due considerations to the principles in the P&P document must the proposed characters be entirely in the BMP? If, is a rationale provided? 7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? 8. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? If, is a rationale for its inclusion provided? 9. Can any of the proposed characters be encoded using a composed character sequence of either existing characters or other proposed characters? If, is a rationale for its inclusion provided? 10. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? If, is a rationale for its inclusion provided? 11. Does the proposal include use of combining characters and/or use of composite sequences? If, is a rationale for such use provided? Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided? 12. Does the proposal contain characters with any special properties such as control function or similar semantics? If, describe in detail (include attachment if necessary) 13. Does the proposal contain any Ideographic compatibility character(s)? If, is the equivalent corresponding unified ideographic character(s) identified?