Indic layout Requirements in E-publishing-Hindi as Initial language

Similar documents
Creating a Resume Webpage with

3. Add and delete a cover page...7 Add a cover page... 7 Delete a cover page... 7

Word Processing programs and their uses

MS Word 2007 practical notes

When older typesetting methods gave

Voluntary Product Accessibility Template Blackboard Learn Release 9.1 April 2014 (Published April 30, 2014)

Dreamweaver CS4 Day 2 Creating a Website using Div Tags, CSS, and Templates

Contents. Launching FrontPage Working with the FrontPage Interface... 3 View Options... 4 The Folders List... 5 The Page View Frame...

Chapter 10: Multimedia and the Web

Mobile Web Site Style Guide

ELFRING FONTS UPC BAR CODES

Rendering/Layout Engine for Complex script. Pema Geyleg

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

ebooks: Exporting EPUB files from Adobe InDesign

Using Adobe Dreamweaver CS4 (10.0)

Using Style Sheets for Consistency

We automatically generate the HTML for this as seen below. Provide the above components for the teaser.txt file.

Word processing software

WCAG 2.0 Checklist (Detailed)

Web Authoring CSS. Module Descriptor

General Electric Foundation Computer Center. FrontPage 2003: The Basics

In this session, we will explain some of the basics of word processing. 1. Start Microsoft Word 11. Edit the Document cut & move

Blackboard Web Community Manager WCAG 2.0 Support Statement February 2016

Outline. CIW Web Design Specialist. Course Content

Excel 2007 Basic knowledge

HSU Accessibility Checkpoints Explained

Systems Analysis Input and Output 1. Input and Output

Adobe Conversion Settings in Word. Section 508: Why comply?

Using Microsoft Word. Working With Objects

Handout: Word 2010 Tips and Shortcuts

Web Design Specialist

Outline of CSS: Cascading Style Sheets

Web Content Accessibility Guidelines 2.0 Checklist

Garfield Public Schools Fine & Practical Arts Curriculum Web Design

Web Designing with UI Designing

FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2

VPAT. Voluntary Product Accessibility Template. Version 1.5. Summary Table VPAT. Voluntary Product Accessibility Template. Supporting Features

Creating Web Pages with HTML Simplified. 3rd Edition

About XML in InDesign

Fast track to HTML & CSS 101 (Web Design)

WCAG 2.0 Checklist. Perceivable Web content is made available to the senses - sight, hearing, and/or touch. Recommendations

Saving work in the CMS Edit an existing page Create a new page Create a side bar section... 4

BLACKBOARD 9.1: Text Editor

Microsoft Word 2010 Prepared by Computing Services at the Eastman School of Music July 2010

Research on HTML5 in Web Development

How to Monitor and Identify Website Issues. 2013, SolarWinds Worldwide, LLC. All rights reserved. Share:

Book Builder Training Materials Using Book Builder September 2014

Dreamweaver. Introduction to Editing Web Pages

F9D7 04 (ESKWP2): Word Processing Software 2

paragraph(s). The bottom mark is for all following lines in that paragraph. The rectangle below the marks moves both marks at the same time.

ADOBE DREAMWEAVER CS3 TUTORIAL

Microsoft PowerPoint 2010

The Unicode Standard Version 8.0 Core Specification

Web Design for Print Designers WEB DESIGN FOR PRINT DESIGNERS: WEEK 6

What's New in QuarkXPress 10

Web Development I & II*

2 Which of the following is not an underline option in the format cells dialog box?

How to Use the Text Editor in Blackboard

Microsoft Excel 2010 Tutorial

Table and field properties Tables and fields also have properties that you can set to control their characteristics or behavior.

Introduction To Microsoft Office PowerPoint Bob Booth July 2008 AP-PPT5

EUROPEAN COMPUTER DRIVING LICENCE / INTERNATIONAL COMPUTER DRIVING LICENCE WEB EDITING

Creating Web Pages with Dreamweaver CS 6 and CSS

Search Engine Optimization Glossary

Google Sites: Creating, editing, and sharing a site

AFTER EFFECTS FOR FLASH FLASH FOR AFTER EFFECTS

Voluntary Product Accessibility Template (VPAT)

MANUSCRIPT TITLE (CENTERED, 14 POINT BOLD, SANS SERIF FONT, MAJUSCULE)

PDF Primer PDF. White Paper

ebooks: From Adobe InDesign to the Kindle Store

Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation

Formatting & Styles Word 2010

PowerPoint 2013: Basic Skills

HIT THE GROUND RUNNING MS WORD INTRODUCTION

Year 1 reading expectations (New Curriculum) Year 1 writing expectations (New Curriculum)

Responsive Web Design Creative License

Web Design Basics. Cindy Royal, Ph.D. Associate Professor Texas State University

PDF Accessibility Overview

Adobe Dreamweaver CC 14 Tutorial

I ntroduction. Accessing Microsoft PowerPoint. Anatomy of a PowerPoint Window

Microsoft Office. Mail Merge in Microsoft Word

Teaching Methodology for 3D Animation

Windows 95. 2a. Place the pointer on Programs. Move the pointer horizontally to the right into the next window.

PaperlessPrinter. Version 3.0. User s Manual

{color:blue; font-size: 12px;}

Basic tutorial for Dreamweaver CS5

Using Impatica for Power Point

Creating Accessible Word Documents

Encoding script-specific writing rules based on the Unicode character set

The Reporting Console

Adobe Acrobat 6.0 Professional

elearning Instructional Design Guidelines Ministry of Labour

Introduction to Microsoft Word 2003

Microsoft Office 2013

Introduction to Word 2007

Transcription:

Indic layout Requirements in E-publishing-Hindi as Initial language 1. Introduction 2. Storage requirements 2.1. Support of UNICODE and CLDR 2.2. Fonts 2.2.1. Open Type fonts 2.2.2. SVG Fonts 2.2.3. WOFF 3. Text layout requirements 3.1. Justification of page 3.2. Paginating single column text 3.2.1. Widows 3.2.2. Orphans 3.3. Adjustments of running heads & page numbers 4. Styling requirements 4.1. Drop caps 4.2. Line breaking 4.2.1. Hyphenation 4.2.2. Guiding principles of Line breaking for Indian languages 4.3. Text segmentation 4.4. Underlining 4.5. Embedded fonts 5. ABNF Valid segmentation-proposed solution for layout issues in Indian languages

5.1. Various Use cases of ABNF based Indic Syllable definition for Hindi language as example 6. Future W3C standards for E- Publishing 6.1. HTML 5 6.2. CSS Mobile 6.3. CSS Speech 7. References

1. Introduction This documents describes requirements for pagination and layout of books in Hindi language, based on the tradition of print book design and composition. This document describes minimal requirements specifications for Indian languages text layout required for content format in E-publishing. This documents covers major issues of E-content in Indian languages in order to create standardize format of text layout like storage, rendering problems, vertical writing, margins areas, page numbers, repeated head, line breaking etc and CSS requirements for Indian languages. 2. Storage requirements 2.1. Support of UNICODE and CLDR UNICODE is the Universal character encoding standard, used for representing text for information processing. Unicode encodes all of the individual characters used for all the written languages of the world. The standards provide information about the character and their use. Unicode uses a 16 bit encoding that provides code point for more than 65000 characters (65536). It assigns each character a unique hexadecimal numeric value and name. Common Locale Data Repository is the largest standard repository of locale data in the world. It is a part of the W3C and Unicode Standard. It provides locale data in an XML format for use in computer applications. It facilitates locale-related information sharing among applications regardless of their domains. Its goal is to provide basic linguistic information for diverse locales in an open, interoperable form. This data is usable for localizing applications. Some examples of the information that CLDR gathers for languages and territories are: Date formats Time Zones Number formats Currency and its formats

Measurement Systems Collation (Sort order) Specification: Sorting, Searching and Matching Translations of names for language, territory, script, time zones, currencies Script and exemplar characters used by a language. Calendaring rules, Formats and important dates. Specification of selected but universal cultural terminologies. Reference URL: http://cldr.unicode.org/ 2.2. Fonts 2.2.1. Open Type fonts Open Type fonts convert the Unicode code numbers to their glyphs on the display interface. They are directly based on Unicode. Open Type provides a series of enhancements to the TrueType format, the most significant of which allows PostScript font data to nest inside a TrueType software wrapper. Open Type allows type designers and font foundries to create larger character sets within fonts. Within the parameters of the TrueType and Type 1 formats, fonts are limited to 256 characters. If a typeface designer wanted to create an extended ligature set, small caps, swash and alternate characters, or characters to support multiple languages, these had to be put into another font. The large character set capabilities of Open Type allows type designers much more latitude in typeface design, resulting in better graphic communication. 2.2.2. SVG Fonts The purpose of SVG fonts is to allow for delivery of glyph outlines in displayonly environments. SVG fonts that accompany Web pages must be supported only in browsing and viewing situations. Graphics editing applications or file translation tools must not attempt to convert SVG fonts into system fonts. Reference URL: http://www.w3.org/tr/svg/fonts.html

2.2.3. WOFF This format was designed to provide lightweight, easy-to-implement compression of the font data, suitable for use in conjunction with the @font-face CSS declaration. Any TrueType/Open Type/Open Font Format file can be loss-lessly converted to WOFF for Web use (subject to licensing of the font data). Once decoded by a user agent, the WOFF font will display identically to the original desktop font from which it was created. The WOFF format also allows additional metadata to be attached to the file; this can be used by font designers to include licensing or other information, beyond that present in the original font. Such metadata does not affect the rendering of the font in any way, but may be displayed to the user on request. Reference URL: http://www.w3.org/tr/woff/ 3. Text layout requirements 3.1. Paginating single column text When the content is only text,the two issues Widows and orphans are the two common problems in digital publishing. Widows and orphans are those words or short phrases at the end or beginning of paragraphs that are left to sit alone at the top or bottom of a column separated from the rest of the paragraph. 3.1.1. Widows A widow is a line that ends a paragraph and is abandoned at the beginning of a new page. The bulk of the paragraph is located on the previous page. The widow line has been abandoned at the end of the paragraph. Widow lines can be confusing because they cause a disruption in the paragraph. They often contain only a few words so they appear to be short and ill-formed sentences. Depending on how the line is formed, readers may have to back-track once they have finished the paragraph to make sure that they understand it correctly.

Widow problem 3.1.2. Orphans It is a single line at the bottom of a page which finishes a paragraph that continues on the next page. It can refer to the minimum number of lines required before a page break

Orphan problem 3.2. Adjustments of running heads & page numbers Positioning of all running heads and page numbers in the same book should be consistent. The following ways might be used for positioning running heads and page numbers in horizontal writing system:

4. Styling requirements 4.1. Drop caps The first-letter pseudo-element represents the first letter of the first line of a block, if it is not preceded by any other content (such as images or inline tables) on its line. It allows that first letter to be styled individually, without markup. It may be used for "initial caps" and "drop caps", which are common typographical effects in text in Latin script.

4.2. Line breaking When inline-level content is laid out into lines, it is broken across line boxes. Such a break is called a line break. In most writing systems, in the absence of hyphenation a line break occurs only at word boundaries. Many writing systems use spaces or punctuation to explicitly separate words, and line break opportunities can be identified by these characters. Line breaking, also known as word wrapping, is the process of breaking a section of text into lines such that it will fit in the available width of a page, window or other display area. 4.2.1. Hyphenation Hyphens are used when a word remains incomplete at the end of a line while writing or when specifying a range. There are different cases of hyphenation, some of the cases are given below : Case 1 : Hyphens are commonly used in Copulative compounds words in Hindi language. Hindi has both prefixes and suffixes which are joined to words with a hyphen. नर-न र, ऱ भ- ह नन, म त -प त, ऊ च - न च Hyphenated words at the end of the line

Fig 2 Case 2: Single word can breaks at the end of the line at Indic syllable level using hyphen word breaks using hyphens Fig 3 In digital publishing hyphenation can be used as different cases described in this section. Publishers can used these cases in order to increase the readability of the text in Hindi language. Words not hyphenated & Wrap properly

4.2.2. Guiding principles of Line breaking for Indian languages In Indic writing system, it is preferred that line breaks at word boundaries,if required following principles may be adhered : Rule 1: New line cannot begin with following symbols/punctuation marks. Also these should be retain with the associated text Closing brackets Devanagari Danda /Purnaviram Commas Visarga Decimal symbols Semicolon Repetition of punctuation marks such as semicolon with closing brackets, Semicolon with single/ Double quotes, Closing brackets with commas/semicolon etc Mathematical operators Rule 2: The definition of Indic syllable may be used to break the line and a hyphen should be at the breaking point so that word can be read intuitively Rule 3: The hyphenated words can be broken at the hyphen e.g.: नर-न र should be treated as: नर- on the first line and न र on the next line Rule 4: Expression with mathematical symbol should be treated as single unit so that at the end of the line expression should not breaks at operator level Rule 5: Breaking should not be allowed at numerical values such as currency values, year etc. e.g. 100.00 or 10,000, nor in 12:59

100.00 or 10,000, nor in 12:59 4.3. Text segmentation A string of Unicode-encoded text often needs to be broken up into text elements programmatically. Common examples of text elements include what users think of as characters, words, lines (more precisely, where line breaks are allowed), and sentences. The precise determination of text elements may vary according to orthographic conventions for a given script or language. The goal of matching user perceptions cannot always be met exactly because the text alone does not always contain enough information to unambiguously decide boundaries. For example, the period (U+002E FULL STOP) is used ambiguously, sometimes for end-of-sentence purposes, sometimes for abbreviations, and sometimes for numbers. In most cases, however, programmatic text boundaries can match user perceptions quite closely, although sometimes the best that can be done is not to surprise the user. Word boundaries are used in a number of different contexts. The most familiar ones are selection (double-click mouse selection, or move to next word control-arrow keys), and Whole Word Search for search and replace. They are also used in database queries, to determine whether elements are within a certain number of words of one another. Some special sentence boundaries like the double poorna virama, possibly with numbers (as in Sanskrit text, shlokas etc.) Grapheme cluster boundaries are important for collation, regular expressions, UI interactions (such as mouse selection, arrow key movement, backspacing), segmentation for vertical text, identification of boundaries for first-letter styling, and counting character positions within text. *UAX29+For others characters, the text segmentation should be done as Indic syllable. 4.4. Underlining There is some examples of Indian languages in which Matra s are not readable due to underlining of characters When we see these pages on internet, the information is not clearly readable because if we hyperlink the text in Indian languages some modifiers (matras) are It can create problem in reading the information correctly.

There is need to define some mechanism in CSS3.0 to define underling for non- Latin scripts. 4.5. Embedded fonts The epub3 specification supports OTF and WOFF fonts - but not TTF fonts. WOFF font conversion is not hard, and standard TTF fonts can be easily transformed to WOFF for use in an epub3 package. WOFF fonts are unlikely to work on any current epub2 readers. CSS introduces the defining terms: Matras are not readable properly Font-face- A specific font with a defined set of characters and distinctive weight, style, stretch or variant properties. Font-family- An associated set of font-faces each with separate weight, style, stretch and variant properties defined into a single family group. First, add the font to your book files in the normal way, by adding an @font-face statement at the beginning of your CSS, something like this: @font-face { font-family: Prophecy Script; font-style: normal;

font-weight: normal; src:url("fonts/prophecy_script.ttf"); }

Embedded font 5. ABNF based definition of Indic syllable ABNF Valid Segmentation based Indic syllable definition is provided here for correct and standardized representation of Indian languages layout. This will address various issues mentioned in the following sections. This definition will be useful in order to get the uniform display of Indic layout in the browsers, applications, Digital publishing etc. The linguistic definition of Indic syllable has been mapped to ABNF(Augmented Backus Naur Form) for the purpose of text segmentation, line breaking, drop letter, letter spacing in horizontal text and vertical text representation. The definition has been elaborated, taking Hindi as an example.

The definition is a combination of 3 rules : Rule 1 : V[m] Rule 2 : {CH}C[v][m] Rule 3 : CH (This rule is applicable only at the end of the word) V(upper case) is independent vowel m is modifier(anusvara/visarga/chandrabindu) C is a consonant which may or may not include a single nukta v (lower case) is any dependent vowel or vowel sign (mātrā) H is halant / virama is a rule separator [ ] - The enclosed items is optional under this bracket

{} - The enclosed item/items occurs zero or repeated multiple timesvarious Use cases of ABNF based Indic Syllable definition for Hindi language as example Rule 1 : V[m] Sl. No. Examples Definition 1. अ, ई, उ V (Vowel) is a syllable 2. अ, उ, आ V+ Modifier is a syllable Rule 2 : {CH}C[v][m] Sl. No. Examples Definition 1. र, क, ज, ऱ, म Consonant is a syllable 2. प,क ख,च त, ज जज जव, त कक ऱ, त क न Zero or more Consonant + Virama sequences followed by consonant is a syllable 3. तत, त क त, त क नत, त क नयत, फ क Zero or more Consonant (Nukta) +Virama followed by consonant is a syllable 4. त त, त क नय त, फ कज, क य Zero or more consonant+ (Nukta)+ virāma sequences followed by a consonant (+Nukta) followed by a vowel sign is a syllable 5. त, त, र, त, फ कज zero or more consonant+ (Nukta)+ virāma sequences followed by a consonant (+Nukta) followed by modifier is a syllable 6. त क नय त: त क नयय, त क नयय, फ कज,हह zero or more consonant+ (Nukta)+ virāma sequences followed by a consonant (+Nukta) followed by a vowel sign and modifier is a syllable 7. स,स ज जज,ख व Zero or more Consonant +halant sequences followed by a consonant followed by vowel sign is a syllable

Rule 3 : CH त, व, म, भ etc are syllable in Hindi only at the end of the word Examples of combination of the rules : 1. स व गतम - CHCv + C + C + CH has following syllables : स व ग त म CHCv C C CH 2. भरतन ट यम- C + C + C + Cv + CHC + C भ र त न ट य म C C C Cv CHC C 6. Future W3C standards for e-publishing 6.1. HTML5 HTML5 is the emerging internet standard that is slowly seeing wider adoption within the publishing industry. It has a number of enticing multimedia factors that allow for embedding music and video without needing third party plugins like Silverlight or Adobe Flash. The core features of HTML5 are : Semantic Markup

Most semantic markup will be controlled by tools. What is meant by semantic markup is ensuring that all concepts and chunks of information are grouped or tagged correctly. For example, a definition and key word are noted as such or an article is grouped. Responsive Designs Building responsive designs is heavily dependent on having your semantic markup correct. Reading takes place on a wide range of devices, from a small smartphone to a large computer monitor. A design that works well on one screen size may be unreadable on another. If your content is almost completely text, you can trust the reader app to display the text properly. As you get into more highly designed content, the publisher will want control over the look and feel. Cross Platform, Consistent, Experience- With Flash unavailable on Apple devices, HTML5 is able to connect a visual experience across channels- computer, tablet, and mobile. Cleaner, Neater Code- Developing in HTML5 is cleaner and easier to maintain. Easier Video/ Audio integration- No need for 3rd party players (like Flash or Java) anymore. HTML5 can include video and audio elements right into the code. Local Storage Capabilities- Think of it like cookies meets client side database with the ability to store data easier than ever before- and its been adopted by all major browsers. Mobile- and probably most important, HTML5 is a stronger, simpler way of developing which makes it ideal for mobile devices. Quicker load times and functionality ensures for a better user experience. 6.2. CSS Mobile The following CSS mobile properties must be found for Indian languages in order to get the proper E-content on mobile Vertical-align

This property affects the vertical positioning inside a line box of the boxes generated by an inline-level element. Text-decoration This property describes decorations that are added to the text of an element using the element's color. When specified on or propagated to an inline element, it affects all the boxes generated by that element, and is further propagated to any in-flow block-level boxes that split the inline. Value: none [underline overline line-through blink] Letter-spacing This property specifies spacing behavior between text characters. Text-indent This property specifies the indentation of the first line of text in a block container. More precisely, it specifies the indentation of the first box that flows into the block's first line box. The box is indented with respect to the left (or right, for right-to-left layout) edge of the line box. User agents must render this indentation as blank space. Reference URL: http://www.w3.org/tr/css-mobile/ 6.3. CSS Speech The CSS Speech module provides properties that enable authors to declaratively control presentational aspects of the aural dimension (e.g. TTS voice, pitch, rate, and volume levels). These style sheet properties can be used together with visual properties (mixed media), or as a complete aural alternative to a visual presentation.typical examples include in-car use of an e-book reader, industrial and medical documentation systems, home entertainment, helping users to learn reading, or supporting users who have reading difficulties (print disabilities). Properties voice-volume

The voice-volume property allows authors to control the amplitude of the audio waveform generated by the speech synthesizer, and is also used to adjust the relative volume level of audio cues within the audio "box" model. voice-balance The voice-balance property controls the spatial distribution of audio output across a lateral sound stage: one extremity is on the left, the other extremity is on the right hand side, relative to the listener's position. speak The speak property determines whether or not to render text aurally. speak-as The speak-as property determines in what manner text gets rendered aurally, based upon a basic predefined list of possible values. Pause properties The pause-before and pause-after properties specify a prosodic boundary (silence with a specific duration) that occurs before (or after) the speech synthesis rendition of the selected element, or if any cue-before (or cue-after ) is specified, before (or after) the cue within the audio "box" model. Rest properties The rest-before and rest-after properties specify a prosodic boundary (silence with a specific duration) that occurs before (or after) the speech synthesis rendition of an element within the audio "box" model. Cue properties The cue-before and cue-after properties specify auditory icons (i.e. prerecorded / pre-generated sound clips) to be played before (or after) the selected element within the audio "box" model. Voice characteristic properties a. voice-family b. voice-rate

c. voice-pitch d. voice-range e. voice-stress f. voice-duration