Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt



Similar documents
Encoding script-specific writing rules based on the Unicode character set

Kazuraki : Under The Hood

EURESCOM - P923 (Babelweb) PIR.3.1

Unicode in Mobile Phones

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification

Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices

Right-to-Left Language Support in EMu

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Internationalization & Localization

Multi-lingual Label Printing with Unicode

FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

When older typesetting methods gave

HKSCS-2004 Support for Windows Platform

Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC

Keyboards for inputting Japanese language -A study based on US patents

The Microsoft Layer for Unicode on Windows 95/98/Me Systems

Preservation Handbook

The Three Cueing Systems

Authority file comparison rules Introduction

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. Version

Course Syllabus. Technical Writing Certificate - Basic

DataPA OpenAnalytics End User Training

Internationalization & Pseudo Localization

Translution Price List GBP

The Virtual Tibetan Classroom

Guidelines for Writing System Support

Interpreting areading Scaled Scores for Instruction

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

Easy Bangla Typing for MS-Word!

Encoding Text with a Small Alphabet

A) the use of different pens for writing B) learning to write with a pen C) the techniques of writing with the hand using a writing instrument

Web Development I & II*

Designing Global Applications: Requirements and Challenges

Tip or Technique. Managing Fonts. Product(s): IBM Cognos 8 Area of Interest: Infrastructure

Microsoft Publisher 2010 What s New!

TYPING IN ARABIC (WINDOWS XP)

Common Core Progress English Language Arts

What's New in QuarkXPress 10

Unraveling Unicode: A Bag of Tricks for Bug Hunting

The Unicode Standard Version 8.0 Core Specification

Data Tool Platform SQL Development Tools

What s New in QuarkXPress 8

One Report, Many Languages: Using SAS Visual Analytics to Localize Your Reports

Gujarati Indic Input 3 - User Guide

WORKING DRAFT. ISO/IEC International Standard International Standard ISO/IEC st Edition + Amd1

L2/ Abstract Introduction

CaptainCasa. CaptainCasa Enterprise Client. CaptainCasa Enterprise Client. Feature Overview

HP Service Manager Compatibility Matrix

Internationalized Domain Names -

Table Of Contents. iii

Data Integrator. Encoding Reference. Pervasive Software, Inc B Riata Trace Parkway Austin, Texas USA

Red Hat Enterprise Linux International Language Support Guide

User Manual Microsoft Dynamics AX Add-on LabAX Label Printing

Liblouis a universal solution for Braille transcription services

English Appendix 2: Vocabulary, grammar and punctuation

PaperlessPrinter. Version 3.0. User s Manual

Proposal to Encode the Khojki Script in ISO/IEC 10646

Reading/Fluency Standards Based Annual Goals

Free/Open Source Software: Localization

Phonics and Word Work

DNNCentric Custom Form Creator. User Manual

Applies to Version 6 Release 5 X12.6 Application Control Structure

Internationalization of Domain Names

Produce Traceability Initiative Best Practices for Formatting Hybrid Pallet Labels

Synergy Controller Application Note 4 March 2012, Revision F Tidal Engineering Corporation Synergy Controller Bar Code Reader Applications

Scientific Graphing in Excel 2010

2) Upon Alarm condition: facilitate the compilation and printing of alarm information to local & networked printers.

Teradata SQL Assistant Version 13.0 (.Net) Enhancements and Differences. Mike Dempsey

Produce Traceability Initiative Best Practices for Formatting Hybrid Pallet Labels

Typing Devanagari on Mac OS X compiled by José C. Rodriguez, Emory College Language Center, Emory University 2009

Graphic Standards Manual

CRM On Demand. Siebel Marketing On Demand Online Help

DATA ITEM DESCRIPTION

HP Business Notebook Password Localization Guidelines V1.0

PROMOTION OF THE ARABIC DOMAIN NAME SYSTEM

Visual Communication Program Assessment Revised Graphic Design Portfolio Checklist/Assessment

Prepare your result file for input into SPSS

Pageflex Persona Cross Media Suite

Analyzing Unicode Text with Regular Expressions

Japanese Character Printers EPL2 Programming Manual Addendum

Barcodes principle. Identification systems (IDFS) Department of Control and Telematics Faculty of Transportation Sciences, CTU in Prague

Frequently Asked Questions on character sets and languages in MT and MX free format fields

Computing Concepts with Java Essentials

Centricity Enterprise Web 3.0 DICOM Conformance Memo DOC

How to Develop Accessible Linux Applications

How To Read With A Book

Tivoli Integrated Portal Online help. Version 1.0 Tivoli Integrated Portal 2.2

Participant Guide RP301: Ad Hoc Business Intelligence Reporting

Massachusetts Tests for Educator Licensure

Hindi Indic Input 3 - User Guide

WEB DEVELOPMENT IA & IB (893 & 894)

T GG GG P IT RO Q U Q I C I K K S T S A A T R T G U D

Transcription:

Rendering/Layout Engine for Complex script Pema Geyleg pgeyleg@dit.gov.bt

Overview What is the Layout Engine/ Rendering? What is complex text? Types of rendering engine? How does it work? How does it support the display of Dzongkha text?

What is Layout Engine / Rendering? How different scripts are displayed by the particular software. It identifies the script that the user wants, and displays the text using that script correctly. The Latin script, is least complex script to display especially when used to write English. Mainly used to display complex scripts properly /correctly.

What Is Complex Text? Unicode: not just a bigger character set Bidirectionality: mixed directions on a line Shaping: character shapes depend on context Ligatures: mandatory special forms, and no Unicode equivalent Positioning: vertical and horizontal adjustments Reordering: character positions depend on context Split characters: some characters appear in more than one position

Bidirectional Text Visual order differs from storage order Arabic and Hebrew read right to left, but numbers still read left to right Memory Reading order

Character Shaping Arabic character shapes change to connect adjacent characters Noon

Ligatures Arabic and Devanagari represent some character sequences with ligatures Lam-alef ligature lam alef KA VIRAMA SSA

Character Positioning Thai (and other scripts) require characters to reposition MAI THO KO KAI SARA UEE

Reordering Some Hindi characters reorder based on context Logical Order Visual Order

Split Characters Thai and many Indic languages display a single character in multiple positions Logical Characters Visual Glyphs Displayed Result

Types of rendering/ Layout engine? Uniscribe Pango This is the rendering engine used by the Microsoft software. Pan in Greek means all and go in Japanese means language. It is an Open-source framework for the layout and rendering of internationalized text. Gnome applications use it for rendering. ICU Layout engine ICU stands for International component for Unicode. Maintained by IBM and this rendering engine is being used in Open office application.

Prerequisite. The particular script should be supported by the software. Unicode & ISO 10646 Standards. A working font for that script should exist. Open type fonts are preferred. A keyboard driver for that script should be developed

Overview on working of Layout Engine The font for a particular script contains rules. Two main categories called GPOS (glyph positioning) and GSUB (glyph substitution). There are features like ccmp (composition and decomposition), blws (below base substitution) etc. falling under GSUB rule. Other features like blwm (below base mark positioning), abvm (above base mark positioning) kern etc. fall under GPOS rule. The fonts may contain language tags for the languages they support. All combinations of characters used by particular languages are accessed by rules or lookups defined in the fonts. The rendering engine has to identify the script, select the fonts, apply correct rules from the fonts and display it.

working of Layout Engine User input is stored in a buffer/memory. Identify a script by looking at the Unicode values in the buffer. Determine the bidirectional levels for the text. Update the language tag using information. Determine a language engine from the updated language tag and script. Determine a set of possible fonts from the updated language tag and the font properties for the character. These fonts are sorted according to how well they match the language tag and font properties. Apply the rules defined in the font to the Unicode values stored in the buffer. Do character, word, line boundary analysis. The output of this process is usually per line. These are then fed into the renderer.

LayoutEngine Class Hierarchy in ICU LayoutEngine GXLayoutEngine OpenTypeLayoutEngine ThaiLayoutEngine ArabicOpenTypeLayoutEngine IndicOpenTypeLayoutEngine UnicodeArabicOpenTypeLayoutEngine DzongkaOpenTypeLayoutEngine

How does it support Dzongkha Text Encoding Model for Dzongkha script OpenType Features for Dzongkha Fonts

Encoding Model for Dzongkha script Regular & Combining Consonants Vertically combined conjuncts of consonants and vowels. Neighboring characters should stack vertically or be written left to right, not always determined by contextual or grammatical rules. explicitly stacking model. In UCS two complete sets of consonants are encoded as separate characters. i.e headline consonant characters [U+0F40-U+0F6A], and combining consonant characters [U+0F90 U+0FBC]

Character Order Conjunct stacks are encoded in the order in which the parts are written. consonant in the topmost or headline position, followed by characters for any combining consonants and then by the character(s) for any vowel(s).

Syllables & Encoding The basic unit of meaning or morpheme in Dzongkha is the tsheg bar usually referred to as a syllable. Each syllable contains a root letter (ming zhi) and may additionally have any/or all of the following parts: prefix, head letter, sub-fixed letter, vowel sign, suffix, and post-suffix. Syllables are normally delimited by a tsheg or another punctuation character. There are no inter-word spaces in Dzongkha

Special Characters U+0F0C NON BREAKING TSHEG. In case of a tsheg occurring after the letter nga and before a shad, it is desirable to suppress this behavior. U+0F6A FIXED FORM RA. override the normal contextual shaping of RA

U+0FBA, U+0FBB, U+0FBC: FIXED FORM SUB-JOINED WA, YA & RA. WA YA and RA occurring mid-stack are often normally written in their full form.

U+0FC6 DZONGKHA SYMBOL PADMA GDAN This is an unusual combining symbol character - it may be used to combine with letters or other symbols.

OpenType Features for Dzongkha Fonts An Open Type shaping engine for Dzongkha processes text in stages: 1. Analyzing syllables. 2. Identification of correct cluster of characters. 3. Shaping (substituting) glyphs using GSUB features & lookups in the font 4. Positioning glyphs using GPOS features & lookups in the font.

The Dzongkha syllable strings of UCS characters, in a sequence. These characters are not necessarily ordered within the sequence. The shaping engine first needs to identify the first consonant. Identification of the correct stacks. shaping engine apply contextual shaping or glyph substitution (GSUB) features to the glyph string. applies OpenType positioning (GPOS) features to position glyphs.

SHAPING FEATURES: Glyph Composition Decomposition: Apply lookups under 'ccmp' feature Conjuncts: Apply lookups under 'blws' feature to create conjuncts or ligatures Below-base Marks: Apply additional lookups under 'blws' to get any additional below-base combining consonants and any below-base vowel marks; and other below-base marks. Above-base Marks: Apply lookups under 'abvs' feature to get any abovebase vowel conjuncts; above-base vowel modifiers; and above-base marks.

Refernces Pango : www.pango,org Uniscribe: http://www.microsoft.com/typography/developers/uniscri be/default.htm ICU: http://oss.software.ibm.com/icu OpenType Specifications: http://www.microsoft.com//typography/tt/tt.htm TrueType Font File Specification: http://fonts.apple.com/ttrefman/rm06/chap6.html