The Encoding and Processing of Indian Languages - an Alternative Approach

Size: px
Start display at page:

Download "The Encoding and Processing of Indian Languages - an Alternative Approach"

Transcription

1 The Encoding and Processing of Indian Languages - an Alternative Approach Md Maruf HASAN Computational Linguistic Laboratory, Nara Institute of Science and Technology , Takayama, Ikoma, Nara, Japan, mmhasan@computer.org Abstract CJK (Chinese, Japanese and Korean) language systems are using multi-byte coding, while encoding of existing Indian languages is based on single-byte coding scheme. Due to the special characteristics of Indian languages, it is advantageous to consider multi-byte encoding for computer processing of Indian languages. In this paper, we discuss a model of encoding and processing Indian languages in a multi-byte framework. Keywords: Character Coding, Multilingual Text Processing, Indian Language Processing. Introduction Chinese, Japanese and Korean (CJK) languages are ideographic and consist of a huge number of ideographic characters. CJK languages are being processed by computers of every computing platform quite efficiently (Ken Lunde, 1999) for several years. Due to the large character set of these languages, double-byte encoding is used in many implemented systems. Some CJK implementations are based on UNICODE recommendation (UNICODE, 1996). Indian languages, on the other hand, are considered alphabetic languages with a small set of characters to encode. The single-byte coding scheme of less than a hundred characters is adopted to place these characters in the Extended ASCII area and process them like European languages. UNICODE Consortium also recommended encoding these few number of characters in a compact coding space. In comparison to the European languages, there are certain special characteristics of Indian languages, which can be better handled by using a multi-byte coding scheme. (e.g., French) consist of a few accented characters. Single-byte encoding with proper rendering can efficiently process these European languages. Chinese, Japanese and Korean languages also have simple onedimensional arrangements of their ideographic characters in their written forms. Although Indian langauges have alphabet sets of less than a hundred characters including consonants and vowels, (unlike European languages) this small set of characters generate a few thousands complicated and irregular ligatures. Original glyphs of the vowels and consonants along with complicated rendering techniques can t always maintain the typical face of the ligature. Moreover, dictionary sorting of Indian languages is very unique. Consonants and vowels have different priority in sorting. Using single-byte encoding of the small set of characters (alphabets) of an Indian language is comparable to encoding the small number of radicals of the Chinese language and generating all the several thousand Chinese characters (ideographs) using randering. For Korean language, encoding the Jomo letters and then, randering and generating thousands of Hangul (ideographs) can be another similar example. However, neither in Chinese nor in Korean, such measures are adopted to encode and process these languages. Because of the special characteristics of Indian languages mentioned above, we can take advantage of multi-byte encoding and encode all the consonants and the vowels (comparable to Korean Jomo letters and Chinese radicals) along with the ligatures (comparable to Korean Hanguls and Chinese characters) rather than encoding only the vowels and the consonants. Indian languages can be processed efficiently in this way. This paper addresses such a multi-byte encoding model for Indian languages. English text is a straight-forward one-dimensional array of characters (letters). Other European languages

2 1 Overview of CJK Text Processing Computer processing of Chinese, Japanese and Korean texts is technically quite similar to each other. In this section, we will use Chinese as an example to explain the processing method in detail (Zhao et al., 1990). Coding A multi-byte code is allocated for every Chinese ideographic character. This code is called the Internal Code. Several other codes, e.g., Interchange Code for data communication, QuWei Code for quickly locating a character, etc. are defined for different purposes (Zhao et al., 1990). Font For each Chinese ideographic character, a 16x16 bitmapped font (mainly for display) is created and saved as a binary file. Other high resolution fonts (e.g., 24x24, 48x48 bitmapped fonts) are also created and saved as a binary file (mostly for printing). Current systems are also using TrueType fonts intensively (Lin et al., 1994). Input The input of a huge number of characters using a managable keyboard is only possible through a conversion process, also known as Front End Processing (FEP). For example, Microsoft IME (Input Method Extension, a frequently used FEP) includes a number of methods for inputting CJK characters efficiently. A table consists of Input Keys and target characters' Internal Code is used to map the Input Keys to the target character (ideographs). Almost all the input methods are one-to-many mapping. A prompt-line display and selection mechanism is used to select the appropriate character. The input rate is highly optimized by using different techniques which minimizes the number of keystrokes required to locate a character, a word or a phrase. Output Since output is a simple one-to-one mapping, output of CJK texts is similar to that of other languages. A font management engine locates the unique font for a given internal code and sends it to the appropriate output devices. 2 Characteristics of Indian Languages Indian languages are derived from Sanskrit (an ancient language) script and are used by a population of one billion people. Hindi (Indian national language), Bengali (national language of Bangladesh and the Indian state of West Bengal), Nepali, Tamil, Gurumukhi, Gujarati, Oriya, Telegu, Kannada, Malayalam, etc. are examples of Sanskrit based languages. Hindi and Nepali are written in Devanagari script; all other languages are written in their own scripts. These languages share a number of common linguistic characteristics. Unlike CJK languages, Indian languages consist of a small alphabet set of more or less 50 letters, including the vowels and the consonants. However, these languages do not follow a straight-forward one dimentional array like style to form words as it is the case in English and some other western languages. Table 1: Devanagari and Bengali Codepages (Source: Unicode) Table 1, shows the Devanagari (left hand side) and the Bengali (right hand side) coding table recommended by UNICODE consortium. Single-byte encoding of the symbols included in the table are currently being used in the existing systems. The similarities of the two languages can be noticed from the table. Both languages have a small set of characters to code. Two dimensional characteristics and complex randering is also noticable in Table 1. Moreover, characters in the same position in the left hand side (Devanagari) and in the right hand side (Bengali) mostly share similar pronounciations. These

3 characteristics are also common for other Indian languages. (Consonant) * (Vowel) * ] type ligature, the second or higher order consonants also have a different priority level than that of their individual occurrences or their occurrence as the first consonants in a ligature. A sorting example is illustrated in Figure 2. Considering the special characteristics mentioned above, a mathematical model of processing Indian languages which requires a multi-byte encoding scheme is proposed in the next section. In this approach, all the possible glyphs (including all the ligatures, the vowels and the consonants) are encoded. Input method is also proposed to facilitate an efficient way of inputting text. Figure 1: Example Text Rendering of Indian Text Source: Unicode Although most of the Indian languages have a tiny alphabet set, in constituting a word, the glyphs of the letters may take several different forms and shapes depending on their places of occurance. Vowels usually change their shapes when appearing with a consonant or a ligature and this change is sometimes irregular. Consonants and vowels, and two or more consonants with or without vowels, may combine together to form ligatures and their combined forms may have a totally different look. Moreover, the sequence of letters does not always appears in a straight-forward order. A vowel pronounced after a consonant may appear on the top or at the bottom of that consonant, and it may even appear before the consonant. There are some cases, when one part of a single vowel appears before the consonant and the other part appears after the consonant. Formation of ligatures have many other irregularities. Randering is a complicated issue for Indian language processing. The complexity of randering is easily noticeable from Figure 1, where randering of the Devanagari script is shown. Indian languages also have a unique sorting mechanism. Unlike English, consonants and vowels have different priorities in sorting. Words are sorted by taking the consonant s order as the first consideration and then the associated vowel s order as the second consideration. In a [(Consonant) (Consonant) Figure 2: Example of Complex Sorting 3 Mathematical Model of Multi-byte Coding of Indian Languages In this section, we will introduce the mathematical model of Indian languages, taking Bengali as an example language. Other Sanskrit based Indian languages can also be modelled in the same way. In this model, we treat letters (vowels and consonants) in the Bengali alphabet as radicals or Jomo in CJK languages. Rather than encoding only the letters, we propose encoding all the letters as well as all the linguistically meaningful ligatures they may form.

4 Linguistic analysis is necessary to find only the potential ligatures since many possible ligatures are never used in reality. Finally, we treat these ligatures (along with the independent vowels and consonants) in the same way as the CJK characters (including radicals or Jomos) are treated in CJK systems. This is to process Indian languages in the same fashion as characters are processed in the CJK systems. 3.1 Basic Definitions Definition 1. A consonant in Bengali is represented as c i, i = 1 to 39. (There are 39 commonly used consonants in Bengali ). Constant Set, C = {c i } Definition 2. An independent vowel (dependent vowels are the symbolic variations of independent vowels usually appear with consonants/ligatures) is represented as v j, j = 1 to 11 (There are 11 commonly used vowels in Bengali). Vowel Set, V = {v j } Definition 3. The combinations of one consonant and one vowel; one consonant and one diacritical mark; two or more consonants (with or without vowel or diacritical mark) is called ligature and represented as l k, k = 1 to 2,500. (We analyzed the Bengali language and found that there are about 2,500 commonly used ligatures) Ligature Set, L = {l k } Definition 4. Including Bengali numerals, monetary and other symbols, etc., there are about 20 commonly used symbols in Bengali and they are represented as S l, l = 1 to 20. Symbol Set, S = {s l } Definition 5. Word Constituent Unit (u m ) is defined in the following way: u m c i v j l k, generally m 3000 for Indian languages Word Constituent Unit Set, U = {u m } Definition 6. Words are represented as w n, n = 1 to α virtually. w n = (u m ) + Word Set, W= {w n } Definition 7. We denote B as a set of Bengali characters as follows: Bengali Character Set, B = { b i b i U or b i S } 3.2 Mathematical Model of Multi-byte Code for Bengali language Definition 8. Each element in B is assigned a unique multi-byte (16 bit, in case of double-byte coding) internal code, i i. There exists a function σ, so that i i = σ(b i ) and b j = σ -1 (i i ). Internal Code Set, I = { i i i i = 16 for double-byte coding} If b i appears before b j in the dictionary, then corresponding i i and i j will satisfy i i < i j. Notice here that sorting of Bengali words can now be done simply by comparing the internal codes. 3.3 Mathematical Model of Bengali Character Input Like any CJK system, several input methods can be designed for Indian languages too. We designed an input method called IAYS (Input As You Spell, spelling is unambiguous for Indian languages), where the user is provided with a keyboard layout which includes only the vowels, consonants and symbols. To input a Bengali word constituent unit, users would type in the sequence of consonants and vowels as they spell the unit. For some cases, a selection option will appear in the prompt line for disambiguation. Apparently, it seems that selection key-strokes are an extra overhead. However, the input method's performance can be further optimized using associative rules and word-based or phrase-based input techniques. Word-based and phrasebased input methods give an amazingly high input rate as proven for the CJK systems. It is because the words and the phrases are less ambiguous than a single character, so the selection key is not necessary for most cases. Moreover, abbreviated input of words and phrases is also possible which leads to a high input rate. A simple input method uses the table lookup mechanism, where we have a table of Input Codes (spelling attributes for each word constituent unit) and their respective Internal Codes in each row. It can be noted here that inputting Indian languages is less ambiguous than their CJK counterparts due to the smaller mapping space. For example, Bengali input involves a mapping of 50 to 3,000, where in CJK systems, the mappings are usually 50 to more than 6,000. The following two definitions explain the input process mathematically.

5 Definition 9. Spelling Attribute Set. The spelling attributes of a word constituent unit consist of relevant vowels and consonants. Spelling Attribute Set, A = {a i a i C, a i V} Definition 10. Input Method is a one-to-many mapping, ρ from the spelling attribute set to the Bengali character set: b i = ρ (a j ). 3.4 Mathematical Model of Bengali Character Output Output process is a one-to-one mapping, θ, which maps the unique Internal Code Set into Fonts Attribute Set. The output mechanism is less complicated since it is a mapping between internal codes and font attributes, a one-to-one mapping. Definition 11. Fonts Attribute Set, F = { f i f i is a binary(0,1) sequence of 16*16, 24*24 bits, etc.} Definition 12. Output of a Bengali character is a one-to-one mapping, θ, where f i = θ(i j ), a mapping from internal code to the relevant font. 4 Experimentation and Validation Experimentation and validation of this approach is made by adding the 3,000 glyphs of Bengali word constituent units into the user defined space of the existing CJK systems and by appending the lookup table accordingly in the Front End Processing system. Bengali text processing has been made possible instantly and equally efficiently like CJK texts are processed in the original system. Moreover, our Bengali system instantly inherited all the other resources of the host CJK system. That is, all the available applications are instantly usable with Bengali language, too. Conclusion This is the very first implementation of an Indian language in the CJK text processing framework using multi-byte coding. Although the ligature analysis, font design, etc. are not so efficient and error-free for the time being, the approach we explained here focuses on a more computationally inclined way of processing Indian languages. English, the European languages, and the CJK languages have a long history of development on several platforms of computing (Hu et al., 1989). Processing Indian languages in CJK framework can equally provide instant inheritance of the research results accumulated for CJK languages over the past years. UNICODE advocates using the multi-byte code for every language. Therefore, acquiring extra codespaces for Indian languages to encode their character sets in the same manner as the CJK languages are encoded is technically feasible too. The mathematical model of the CJK system (Qian et al., 1992) is very similar to that of our Bengali system. Thus, it remains easier to port our Bengali system in other platforms where CJK languages have already been successfully implemented. Multilingual environment is assured in this way. Acknowledgement I want to thank Professor Yuji Matsumoto, my current supervisor in Nara Institute of Science and Technology, Japan for kindly reviewing and commenting on this work. Thanks are also due to Professor Mao Yu-Heng and Professor Dai Mei-Er of Tsinghua University, China for their encouragement and advice. Among other Bangladeshi fellows who helped in designing font, testing the prototype and commenting, I must specially acknowledge the contribution of Mohammed Kawser and Ashraful Huq from the initial stage of this research. References Hu Xian-Xiang et al Implementation of a Multilingual Computational Environment Based on X Window System. In Proceedings of Chinese Computing Conference '89, pp. 64 Ken Lunde CJKV Information Processing: Chinese, Japanese, Korean and Vietnamese computing. O'Reilly & Associates, Inc. Lin Yaw-Jen et al Conversion of METAFONT file to TRUETYPE. In Proceedings of International Conference of Chinese Computing, ICCC-94, pp Qian Pei-De et al CCDOS Technical Handbook Volume 1. Tsinghua University Press, Beijing, China (in Chinese). UNICODE Consortium The Unicode Standard 2.0. Addison Wesley. URL: Zhao Po-Zhang et al Chinese Information Processing Technique. Aeronautics and Aerospace Press, Beijing, China (in Chinese).

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode Unicode Why Unicode? What is Unicode? Unicode Architecture Why Unicode? Pre-Unicode Standards

More information

Keyboards for inputting Japanese language -A study based on US patents

Keyboards for inputting Japanese language -A study based on US patents Keyboards for inputting Japanese language -A study based on US patents Umakant Mishra Bangalore, India umakant@trizsite.tk http://umakant.trizsite.tk (This paper was published in April 2005 issue of TRIZsite

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

EURESCOM - P923 (Babelweb) PIR.3.1

EURESCOM - P923 (Babelweb) PIR.3.1 Multilingual text processing difficulties Malek Boualem, Jérôme Vinesse CNET, 1. Introduction Users of more and more applications now require multilingual text processing tools, including word processors,

More information

Red Hat Enterprise Linux International Language Support Guide

Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Copyright This book is about international language support for Red Hat Enterprise

More information

Chapter 4: Computer Codes

Chapter 4: Computer Codes Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data

More information

The use of binary codes to represent characters

The use of binary codes to represent characters The use of binary codes to represent characters Teacher s Notes Lesson Plan x Length 60 mins Specification Link 2.1.4/hi Character Learning objective (a) Explain the use of binary codes to represent characters

More information

Easy Bangla Typing for MS-Word!

Easy Bangla Typing for MS-Word! Easy Bangla Typing for MS-Word! W ELCOME to Ekushey 2.2c, the easiest and most powerful Bangla typing software yet produced! Prepare yourself for international standard UNICODE Bangla typing. Fully integrated

More information

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows

More information

Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt

Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt Rendering/Layout Engine for Complex script Pema Geyleg pgeyleg@dit.gov.bt Overview What is the Layout Engine/ Rendering? What is complex text? Types of rendering engine? How does it work? How does it support

More information

Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices

Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices Tofazzal Rownok, Md. Zahurul Islam and Mumit Khan Department of Computer Science and Engineering, BRAC University, Dhaka,

More information

Multi-lingual Label Printing with Unicode

Multi-lingual Label Printing with Unicode Multi-lingual Label Printing with Unicode White Paper Version 20100716 2009 SATO CORPORATION. All rights reserved. http://www.satoworldwide.com softwaresupport@satogbs.com 2009 SATO Corporation. All rights

More information

DRH specification framework

DRH specification framework DRH specification framework 2007-03-15 EDM - NIED Takeshi KAWAMOTO, Hiroaki NEGISHI, Mitsuaki SASAKI 1 DRH Basic Development before Sep. 2007 Server architectures Search architectures Multilanguage Architectures

More information

Encoding script-specific writing rules based on the Unicode character set

Encoding script-specific writing rules based on the Unicode character set Encoding script-specific writing rules based on the Unicode character set Malek Boualem, Mark Leisher, Bill Ogden Computing Research Laboratory (CRL), New Mexico State University, Box 30001, Dept 3CRL,

More information

Internationalization & Localization

Internationalization & Localization Internationalization & Localization Of OpenOffice.org - The Indian Perspective Comprehensive Office Suite for Multilingual Indic Computing Bhupesh Koli, Shikha G Pillai

More information

Keywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3

Keywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3 Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 4 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Kazuraki : Under The Hood

Kazuraki : Under The Hood Kazuraki : Under The Hood Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated Why Develop Kazuraki? To build excitement and awareness about OpenType Japanese fonts Kazuraki is the first

More information

HP Business Notebook Password Localization Guidelines V1.0

HP Business Notebook Password Localization Guidelines V1.0 HP Business Notebook Password Localization Guidelines V1.0 November 2009 Table of Contents: 1. Introduction..2 2. Supported Platforms...2 3. Overview of Design...3 4. Supported Keyboard Layouts in Preboot

More information

Designing Global Applications: Requirements and Challenges

Designing Global Applications: Requirements and Challenges Designing Global Applications: Requirements and Challenges Sourav Mazumder Abstract This paper explores various business drivers for globalization and examines the nature of globalization requirements

More information

National Language (Tamil) Support in Oracle An Oracle White paper / November 2004

National Language (Tamil) Support in Oracle An Oracle White paper / November 2004 National Language (Tamil) Support in Oracle An Oracle White paper / November 2004 Vasundhara V* & Nagarajan M & * vasundhara.venkatasubramanian@oracle.com; & Nagarajan.muthukrishnan@oracle.com) Oracle

More information

Guidelines for Writing System Support

Guidelines for Writing System Support 2003-10-31 Page 1 of 80 Victor Gaultney (Editor), SIL Non-Roman Script Initiative (NRSI) 2003-10-31 Table of Contents Section 1 Components of a Writing System Implementation... 4 1.1 Writing system implementations...

More information

When older typesetting methods gave

When older typesetting methods gave Typographic Terms When older typesetting methods gave way to electronic publishing, certain traditional terms got carried along. Today we use a mix of old and new terminology to describe typography. Alignment

More information

SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE

SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE 595 SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE Nagaraj N Vaidya Francis Jayakanth Abstract Today 80 % of the content on the Web is in English, which is spoken

More information

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Bangla Localization of OpenOffice.org Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Localization L10n is the process of adapting the text and applications of a product or service to

More information

encoding compression encryption

encoding compression encryption encoding compression encryption ASCII utf-8 utf-16 zip mpeg jpeg AES RSA diffie-hellman Expressing characters... ASCII and Unicode, conventions of how characters are expressed in bits. ASCII (7 bits) -

More information

Very often my clients ask me, Don I need Chinese translation. If I ask which Chinese? They just say Just Chinese. If I explain them there re more

Very often my clients ask me, Don I need Chinese translation. If I ask which Chinese? They just say Just Chinese. If I explain them there re more Hello, fellow colleagues in Translation industry. And, Thank you very much for nice introduction. Vanessa. When you hear the topic Asian Languages and Markets, each of you probably had some questions or

More information

Encoding Text with a Small Alphabet

Encoding Text with a Small Alphabet Chapter 2 Encoding Text with a Small Alphabet Given the nature of the Internet, we can break the process of understanding how information is transmitted into two components. First, we have to figure out

More information

International Language Character Code

International Language Character Code , pp.161-166 http://dx.doi.org/10.14257/astl.2015.81.33 International Language Character Code with DNA Molecules Wei Wang, Zhengxu Zhao, Qian Xu School of Information Science and Technology, Shijiazhuang

More information

Counting in base 10, 2 and 16

Counting in base 10, 2 and 16 Counting in base 10, 2 and 16 1. Binary Numbers A super-important fact: (Nearly all) Computers store all information in the form of binary numbers. Numbers, characters, images, music files --- all of these

More information

Preservation Handbook

Preservation Handbook Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc

More information

Japanese Character Printers EPL2 Programming Manual Addendum

Japanese Character Printers EPL2 Programming Manual Addendum Japanese Character Printers EPL2 Programming Manual Addendum This addendum contains information unique to Zebra Technologies Japanese character bar code printers. The Japanese configuration printers support

More information

.ASIA CJK (Chinese Japanese Korean) IDN Policies

.ASIA CJK (Chinese Japanese Korean) IDN Policies Date: Status: Version: 1.1.ASIA IDN Policies 04-May-2011 COMPLETE Archive URL: References: http://dot.asia/policies/dotasia-cjk-idn-policies-complete--2011-05-04.pdf.asia ZH / JA / KO IDN Language Tables

More information

OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC

OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC We ll look at these questions. Why does translation cost so much? Why is it hard to keep content consistent? Why is it hard for an organization

More information

The Virtual Tibetan Classroom

The Virtual Tibetan Classroom The Virtual Tibetan Classroom by William Magee, DDBC Thanks to a Generous Grant from the Taiwan National Science Council and the Hopkins MultimediaTibetan Research Archive Project http://haa.ddbc.edu.tw

More information

Radicals of Chinese Characters

Radicals of Chinese Characters Radicals of Chinese Characters In order to function in a CJK environment, one must first become comfortable with the concept of radicals ( 部 首 Ch. bùshǒu, J. bushu, K. bŭsu). What is a radical? Simply

More information

Localization of Text Editor using Java Programming

Localization of Text Editor using Java Programming Localization of Text Editor using Java Programming Varsha Tomar M.Tech Scholar Banasthali University Jaipur, India Manisha Bhatia Assistant Professor Banasthali University Jaipur, India ABSTRACT Software

More information

WIRELESS and mobile technology

WIRELESS and mobile technology Mobile Apps In Indian Languages WIRELESS and mobile technology is playing a profound role in networking and communications, even though wire-line technology, such as fiber links, has inherent capacity

More information

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease

More information

The Adobe PostScript Printing Primer

The Adobe PostScript Printing Primer The Adobe PostScript Printing Primer A do be Post Script Contents Since Adobe Systems introduced the PostScript standard in 1985, Adobe PostScript software has sparked a revolution in how we communicate

More information

How to translate your website. An overview of the steps to take if you are about to embark on a website localization project.

How to translate your website. An overview of the steps to take if you are about to embark on a website localization project. How to translate your website An overview of the steps to take if you are about to embark on a website localization project. Getting Started Translating websites can be an expensive and complex process.

More information

HOW TO USE A KANJI DICTIONARY

HOW TO USE A KANJI DICTIONARY HOW TO USE A KANJI DICTIONARY As I write this, in 2008, I fear that the skills I m about to describe will soon be who am I kidding? already are endangered species. That is to say, the way to use a traditional

More information

Internationalized Domain Names -

Internationalized Domain Names - Internationalized Domain Names - Getting them to work Gihan Dias LK Domain Registry What is IDN? Originally DNS names were restricted to the characters a-z (letters), 0-9 (digits) and '-' (hyphen) (LDH)

More information

Welcome to The Grid 2

Welcome to The Grid 2 Welcome to 1 Thanks for choosing! These training cards will help you learn about, providing step-by-step instructions for the key skills you will need and introducing the included resources. What does

More information

SAMPLE TURABIAN STYLE PAPER

SAMPLE TURABIAN STYLE PAPER SAMPLE TURABIAN STYLE PAPER John Doe History 2010 Dr. Johnson July 11, 2014 Doe 1 The Turabian style sample essay has been prepared to help answer some of the questions and problems that you may have when

More information

Creating trouble-free numbering in Microsoft Word

Creating trouble-free numbering in Microsoft Word Creating trouble-free numbering in Microsoft Word This note shows you how to create trouble-free chapter, section and paragraph numbering, as well as bulleted and numbered lists that look the way you want

More information

I PUC - Computer Science. Practical s Syllabus. Contents

I PUC - Computer Science. Practical s Syllabus. Contents I PUC - Computer Science Practical s Syllabus Contents Topics 1 Overview Of a Computer 1.1 Introduction 1.2 Functional Components of a computer (Working of each unit) 1.3 Evolution Of Computers 1.4 Generations

More information

Number Representation

Number Representation Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data

More information

Chapter 1. Binary, octal and hexadecimal numbers

Chapter 1. Binary, octal and hexadecimal numbers Chapter 1. Binary, octal and hexadecimal numbers This material is covered in the books: Nelson Magor Cooke et al, Basic mathematics for electronics (7th edition), Glencoe, Lake Forest, Ill., 1992. [Hamilton

More information

Design of Company s Salary Management System. Lei Xiong

Design of Company s Salary Management System. Lei Xiong 3rd International Conference on Management, Education, Information and Control (MEICI 2015) Design of Company s Salary Management System Lei Xiong College of Information Engineering, Jiangxi University

More information

Demonstration of Barcodes to QR Codes through Text Using Document Software

Demonstration of Barcodes to QR Codes through Text Using Document Software Demonstration of Barcodes to QR Codes through Text Using Document Software Dr. Neeraj Bhargava 1, Anchal kumawat 2, Dr. Ritu Bhargava 3 Associate Professor, Department of Computer Science, School of Engineering

More information

The Hexadecimal Number System and Memory Addressing

The Hexadecimal Number System and Memory Addressing APPENDIX C The Hexadecimal Number System and Memory Addressing U nderstanding the number system and the coding system that computers use to store data and communicate with each other is fundamental to

More information

Freescale Embedded GUI Converter Utility 2.0 Quick User Guide

Freescale Embedded GUI Converter Utility 2.0 Quick User Guide Freescale Semiconductor User Guide Document Number: EGUICUG Rev. 1, 08/2010 Freescale Embedded GUI Converter Utility 2.0 Quick User Guide 1 Introduction The Freescale Embedded GUI Converter Utility 2.0

More information

CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX

CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX Multiple Choice: 1. Processing information involves: A. accepting information from the outside world. B. communication with another computer. C. performing arithmetic

More information

PRICE LIST. ALPHA TRANSLATION AGENCY www.biuro-tlumaczen.tv info@biuro-tlumaczen.tv

PRICE LIST. ALPHA TRANSLATION AGENCY www.biuro-tlumaczen.tv info@biuro-tlumaczen.tv We encourage you to get to know the prices of the services provided by Alpha Translation Agency in the range of standard and certified written translations of common and rare languages, as well as interpretation

More information

BCSD WebMail Documentation

BCSD WebMail Documentation BCSD WebMail Documentation Outlook Web Access is available to all BCSD account holders! Outlook Web Access provides Webbased access to your e-mail, your calendar, your contacts, and the global address

More information

Draft WGIG Issue Paper on the Multilingualization of

Draft WGIG Issue Paper on the Multilingualization of Draft WGIG Issue Paper on the Multilingualization of Internet Naming System This paper is a 'draft working paper' reflecting the preliminary findings of the drafting team. It has been subject to review

More information

Beginning Microsoft Word XP

Beginning Microsoft Word XP Beginning Microsoft Word XP Objective 1: Become acquainted with the Microsoft Word XP environment. Toolbars Standard Toolbar Formatting Toolbar Toolbars provide easy access to commonly used options. The

More information

Chapter 2 Text Processing with the Command Line Interface

Chapter 2 Text Processing with the Command Line Interface Chapter 2 Text Processing with the Command Line Interface Abstract This chapter aims to help demystify the command line interface that is commonly used in UNIX and UNIX-like systems such as Linux and Mac

More information

The Indian National Bibliography: Today and tomorrow

The Indian National Bibliography: Today and tomorrow Submitted on: June 22, 2013 The Indian National Bibliography: Today and tomorrow Shahina P. Ahas Central Reference Library, Kolkata, India E-mail : shahinaprashob@gmail.com Swapna Banerjee Department of

More information

Computer Basics: Chapters 1 & 2

Computer Basics: Chapters 1 & 2 Computer Basics: Chapters 1 & 2 Definition of a Computer What does IPOS stand for? Input Process Output Storage Other types of Computers Name some examples of other types of computers, other than a typical

More information

ELFRING FONTS UPC BAR CODES

ELFRING FONTS UPC BAR CODES ELFRING FONTS UPC BAR CODES This package includes five UPC-A and five UPC-E bar code fonts in both TrueType and PostScript formats, a Windows utility, BarUPC, which helps you make bar codes, and Visual

More information

AccordIt User s Guide

AccordIt User s Guide Version 2 Contents: Using AccordIt AccordIt Menu Items Greek and Hebrew Clipboard Options For More Information Introduction AccordIt is a text processing utility intended for use with Accordance Bible

More information

Writing Reports BJECTIVES ONTENTS. By the end of this section you should be able to :

Writing Reports BJECTIVES ONTENTS. By the end of this section you should be able to : Writing Reports By the end of this section you should be able to : O BJECTIVES Understand the purposes of a report Plan a report Understand the structure of a report Collect information for your report

More information

Typing Devanagari on Mac OS X compiled by José C. Rodriguez, Emory College Language Center, Emory University 2009

Typing Devanagari on Mac OS X compiled by José C. Rodriguez, Emory College Language Center, Emory University 2009 Typing in the Devanagari script on Mac OS X can be done with either the Devanagari-QWERTY keyboard or standard Devanagari keyboard layouts. These are provided free with the Mac OS, but must be installed

More information

4.3 TABLE 3 TABLE 4. 1342 five 1 125 3 25 4 5 2 1 125 75 20 2 222.

4.3 TABLE 3 TABLE 4. 1342 five 1 125 3 25 4 5 2 1 125 75 20 2 222. .3 Conversion Between Number Bases 169.3 Conversion Between Number Bases Although the numeration systems discussed in the opening section were all base ten, other bases have occurred historically. For

More information

Preservation Handbook

Preservation Handbook Preservation Handbook [Binary Text / Word Processor Documents] Author Rowan Wilson and Martin Wynne Version Draft V3 Date 22 / 08 / 05 Change History Revised by MW 22.8.05; 2.12.05; 7.3.06 Page 1 of 7

More information

L2/14-009 Abstract Introduction

L2/14-009 Abstract Introduction P P T 0 1 S P P P P P P S P P P P P 0 S 1 1 S 0 0 1 P 0 S 1 T P 0 S 1 T 1 T P 0 S 1 T P 0 T P P P 0 1 S S 1 0 T P S P 1 0 T S P 0 1 P 0 S 1 T TPPT Form for PT ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY

More information

2011, The McGraw-Hill Companies, Inc. Chapter 3

2011, The McGraw-Hill Companies, Inc. Chapter 3 Chapter 3 3.1 Decimal System The radix or base of a number system determines the total number of different symbols or digits used by that system. The decimal system has a base of 10 with the digits 0 through

More information

Points of Interference in Learning English as a Second Language

Points of Interference in Learning English as a Second Language Points of Interference in Learning English as a Second Language Tone Spanish: In both English and Spanish there are four tone levels, but Spanish speaker use only the three lower pitch tones, except when

More information

Gaiji: Characters, Glyphs, Both, or Neither?

Gaiji: Characters, Glyphs, Both, or Neither? Gaiji: Characters, Glyphs, Both, or Neither? A Graphics and Publishing Industry View Jim DeLaHunt Type Development Group, Adobe Systems Incorporated 1 Abstract Unicode encodes Han characters by the tens

More information

FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2

FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2 FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2 MARK SCHEME Instructions to marker There are 30 marks available for each of the three tasks, which should be marked separately, resulting in a total of 90 marks.

More information

Introduction to Internationalized Domain Names (IDN)

Introduction to Internationalized Domain Names (IDN) Introduction to ized Domain Names (IDN) IP Symposium for CEE, CIS and Baltic States Moscow, Russia 16-19 September 2003 Robert Shaw ITU Internet Strategy and Policy Advisor Agenda

More information

Document Conventions... 2 Technical Requirements... 2. Logging On... 3 Logging Off... 3. Main Menu Panel... 4 Contents Panel... 4 Document Panel...

Document Conventions... 2 Technical Requirements... 2. Logging On... 3 Logging Off... 3. Main Menu Panel... 4 Contents Panel... 4 Document Panel... Contents GETTING STARTED... 2 Document Conventions... 2 Technical Requirements... 2 LOGIN AND LOGOFF... 2 Logging On... 3 Logging Off... 3 USP-NF ONLINE HOME PAGE... 3 Main Menu Panel... 4 Contents Panel...

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:

More information

Things to remember when transcribing speech

Things to remember when transcribing speech Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely

More information

Table Of Contents. iii

Table Of Contents. iii PASSOLO Handbook Table Of Contents General... 1 Content Overview... 1 Typographic Conventions... 2 First Steps... 3 First steps... 3 The Welcome dialog... 3 User login... 4 PASSOLO Projects... 5 Overview...

More information

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts [Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational

More information

USING MICROSOFT WORD 2008(MAC) FOR APA TASKS

USING MICROSOFT WORD 2008(MAC) FOR APA TASKS USING MICROSOFT WORD 2008(MAC) FOR APA TASKS MS WORD 2008(MAC), GENERAL TIPS Backspace and Delete The keyboard has two delete keys: Backspace and Delete. What s the difference? The Backspace key deletes

More information

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal. Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must be able to handle more than just values for real world problems

More information

Binary Representation

Binary Representation Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must tbe able to handle more than just values for real world problems

More information

Online Bulletin Boards An Introduction

Online Bulletin Boards An Introduction Online Bulletin Boards An Introduction Online Bulletin Boards are Internet-based, virtual venues where participants gather and engage in interactive, text-based discussions lead by moderators. Group Works

More information

Section 1.4 Place Value Systems of Numeration in Other Bases

Section 1.4 Place Value Systems of Numeration in Other Bases Section.4 Place Value Systems of Numeration in Other Bases Other Bases The Hindu-Arabic system that is used in most of the world today is a positional value system with a base of ten. The simplest reason

More information

Voluntary Product Accessibility Template Blackboard Learn Release 9.1 April 2014 (Published April 30, 2014)

Voluntary Product Accessibility Template Blackboard Learn Release 9.1 April 2014 (Published April 30, 2014) Voluntary Product Accessibility Template Blackboard Learn Release 9.1 April 2014 (Published April 30, 2014) Contents: Introduction Key Improvements VPAT Section 1194.21: Software Applications and Operating

More information

UNIVERSITY OF MYSORE B Com. ( ANNUAL ) DEGREE EXAMINATIONS - MAY / JUNE 2014 TIME TABLE

UNIVERSITY OF MYSORE B Com. ( ANNUAL ) DEGREE EXAMINATIONS - MAY / JUNE 2014 TIME TABLE 02/06/2014 11002 ENGLISH 31201 BUSINESS LEGISLATION MONDAY (Common to 99 Sch. & equivalent paper to Business Laws of 93 Sch.) 03/06/2014 31102 ENGLISH 31202 BUSINESS STATISTICS TUESDAY 04/06/2014 11013

More information

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02 Printing Unicode characters from SAP to SATO GT4xxe Printers User Guide Version 061030-02 2006 SATO Corporation. All rights reserved. Table of Contents 1. Introduction... 3 2. Configuration at SAP environment...

More information

Visualizing Keyboard Pattern Passwords

Visualizing Keyboard Pattern Passwords Visualizing Keyboard Pattern Passwords Dino Schweitzer, Jeff Boleng, Colin Hughes, Louis Murphy United States Air Force Academy ABSTRACT Passwords are a fundamental security vulnerability in many systems.

More information

MLA Formatting in Microsoft Word 2010/2011

MLA Formatting in Microsoft Word 2010/2011 MLA Formatting in Microsoft Word 2010/2011 Learn to format a research paper in MLA style using Microsoft Word 2010 for Windows and 2011 for Mac. Program Version and Resources for Guide All the recommended

More information

Tutorial Microsoft Office Excel 2003

Tutorial Microsoft Office Excel 2003 Tutorial Microsoft Office Excel 2003 Introduction: Microsoft Excel is the most widespread program for creating spreadsheets on the market today. Spreadsheets allow you to organize information in rows and

More information

Business Portal for Microsoft Dynamics GP 2010. User s Guide Release 5.1

Business Portal for Microsoft Dynamics GP 2010. User s Guide Release 5.1 Business Portal for Microsoft Dynamics GP 2010 User s Guide Release 5.1 Copyright Copyright 2011 Microsoft. All rights reserved. Limitation of liability This document is provided as-is. Information and

More information

HKSCS-2004 Support for Windows Platform

HKSCS-2004 Support for Windows Platform HKSCS-2004 Support for Windows Platform Windows XP Font Pack for ISO 10646:2003 + Amendment 1 Traditional Chinese Support (HKSCS-2004) update for Windows XP and Windows Server 2003 June 2010 Version 1.0

More information

What is Microsoft PowerPoint?

What is Microsoft PowerPoint? What is Microsoft PowerPoint? Microsoft PowerPoint is a powerful presentation builder. In PowerPoint, you can create slides for a slide-show with dynamic effects that will keep any audience s attention.

More information

LuitPad: A fully Unicode compatible Assamese writing software

LuitPad: A fully Unicode compatible Assamese writing software LuitPad: A fully Unicode compatible Assamese writing software Navanath Saharia 1,3 Kishori M Konwar 2,3 (1) Tezpur University, Tezpur, Assam, India (2) University of British Columbia, Vancouver, Canada

More information

Microsoft Outlook Introduction

Microsoft Outlook Introduction Microsoft Outlook Introduction Division of Information Technology February 2016 Contents Document Management History... 3 Introduction... 4 Getting Started... 4 Using MS Outlook... 4 What MS Outlook looks

More information

Research on Applying Web3D Technology to College Library Instruction of Online Book Navigation System. Wang Shuo, Mu Dawei, Zhao Jinlong, Hu Xiaoli

Research on Applying Web3D Technology to College Library Instruction of Online Book Navigation System. Wang Shuo, Mu Dawei, Zhao Jinlong, Hu Xiaoli RESEARCH ON APPLYING WEB3D TECHNOLOGY TO COLLEGE LIBRARY INSTRUCTION OF ONLINE 3D BOOK NAVIGATION SYSTEM Wang Shuo, Mu Dawei, Zhao Jinlong, Hu Xiaoli (Library of Capital Normal University, Beijing, China,

More information

The Language Exchange, Inc. GSA Language Services Catalog

The Language Exchange, Inc. GSA Language Services Catalog The Language Exchange, Inc. GSA Language Services Catalog Federal Supply Service Authorized Federal Supply Schedule List Schedule Title: Language Services Federal Supply Group: 738 Classes R499 Contract

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

IBM Emulation Mode Printer Commands

IBM Emulation Mode Printer Commands IBM Emulation Mode Printer Commands Section 3 This section provides a detailed description of IBM emulation mode commands you can use with your printer. Control Codes Control codes are one-character printer

More information

The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized

The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized to each field. The Language Grid, a software that provides

More information