The Encoding and Processing of Indian Languages - an Alternative Approach
|
|
- Roberta Hodges
- 7 years ago
- Views:
Transcription
1 The Encoding and Processing of Indian Languages - an Alternative Approach Md Maruf HASAN Computational Linguistic Laboratory, Nara Institute of Science and Technology , Takayama, Ikoma, Nara, Japan, mmhasan@computer.org Abstract CJK (Chinese, Japanese and Korean) language systems are using multi-byte coding, while encoding of existing Indian languages is based on single-byte coding scheme. Due to the special characteristics of Indian languages, it is advantageous to consider multi-byte encoding for computer processing of Indian languages. In this paper, we discuss a model of encoding and processing Indian languages in a multi-byte framework. Keywords: Character Coding, Multilingual Text Processing, Indian Language Processing. Introduction Chinese, Japanese and Korean (CJK) languages are ideographic and consist of a huge number of ideographic characters. CJK languages are being processed by computers of every computing platform quite efficiently (Ken Lunde, 1999) for several years. Due to the large character set of these languages, double-byte encoding is used in many implemented systems. Some CJK implementations are based on UNICODE recommendation (UNICODE, 1996). Indian languages, on the other hand, are considered alphabetic languages with a small set of characters to encode. The single-byte coding scheme of less than a hundred characters is adopted to place these characters in the Extended ASCII area and process them like European languages. UNICODE Consortium also recommended encoding these few number of characters in a compact coding space. In comparison to the European languages, there are certain special characteristics of Indian languages, which can be better handled by using a multi-byte coding scheme. (e.g., French) consist of a few accented characters. Single-byte encoding with proper rendering can efficiently process these European languages. Chinese, Japanese and Korean languages also have simple onedimensional arrangements of their ideographic characters in their written forms. Although Indian langauges have alphabet sets of less than a hundred characters including consonants and vowels, (unlike European languages) this small set of characters generate a few thousands complicated and irregular ligatures. Original glyphs of the vowels and consonants along with complicated rendering techniques can t always maintain the typical face of the ligature. Moreover, dictionary sorting of Indian languages is very unique. Consonants and vowels have different priority in sorting. Using single-byte encoding of the small set of characters (alphabets) of an Indian language is comparable to encoding the small number of radicals of the Chinese language and generating all the several thousand Chinese characters (ideographs) using randering. For Korean language, encoding the Jomo letters and then, randering and generating thousands of Hangul (ideographs) can be another similar example. However, neither in Chinese nor in Korean, such measures are adopted to encode and process these languages. Because of the special characteristics of Indian languages mentioned above, we can take advantage of multi-byte encoding and encode all the consonants and the vowels (comparable to Korean Jomo letters and Chinese radicals) along with the ligatures (comparable to Korean Hanguls and Chinese characters) rather than encoding only the vowels and the consonants. Indian languages can be processed efficiently in this way. This paper addresses such a multi-byte encoding model for Indian languages. English text is a straight-forward one-dimensional array of characters (letters). Other European languages
2 1 Overview of CJK Text Processing Computer processing of Chinese, Japanese and Korean texts is technically quite similar to each other. In this section, we will use Chinese as an example to explain the processing method in detail (Zhao et al., 1990). Coding A multi-byte code is allocated for every Chinese ideographic character. This code is called the Internal Code. Several other codes, e.g., Interchange Code for data communication, QuWei Code for quickly locating a character, etc. are defined for different purposes (Zhao et al., 1990). Font For each Chinese ideographic character, a 16x16 bitmapped font (mainly for display) is created and saved as a binary file. Other high resolution fonts (e.g., 24x24, 48x48 bitmapped fonts) are also created and saved as a binary file (mostly for printing). Current systems are also using TrueType fonts intensively (Lin et al., 1994). Input The input of a huge number of characters using a managable keyboard is only possible through a conversion process, also known as Front End Processing (FEP). For example, Microsoft IME (Input Method Extension, a frequently used FEP) includes a number of methods for inputting CJK characters efficiently. A table consists of Input Keys and target characters' Internal Code is used to map the Input Keys to the target character (ideographs). Almost all the input methods are one-to-many mapping. A prompt-line display and selection mechanism is used to select the appropriate character. The input rate is highly optimized by using different techniques which minimizes the number of keystrokes required to locate a character, a word or a phrase. Output Since output is a simple one-to-one mapping, output of CJK texts is similar to that of other languages. A font management engine locates the unique font for a given internal code and sends it to the appropriate output devices. 2 Characteristics of Indian Languages Indian languages are derived from Sanskrit (an ancient language) script and are used by a population of one billion people. Hindi (Indian national language), Bengali (national language of Bangladesh and the Indian state of West Bengal), Nepali, Tamil, Gurumukhi, Gujarati, Oriya, Telegu, Kannada, Malayalam, etc. are examples of Sanskrit based languages. Hindi and Nepali are written in Devanagari script; all other languages are written in their own scripts. These languages share a number of common linguistic characteristics. Unlike CJK languages, Indian languages consist of a small alphabet set of more or less 50 letters, including the vowels and the consonants. However, these languages do not follow a straight-forward one dimentional array like style to form words as it is the case in English and some other western languages. Table 1: Devanagari and Bengali Codepages (Source: Unicode) Table 1, shows the Devanagari (left hand side) and the Bengali (right hand side) coding table recommended by UNICODE consortium. Single-byte encoding of the symbols included in the table are currently being used in the existing systems. The similarities of the two languages can be noticed from the table. Both languages have a small set of characters to code. Two dimensional characteristics and complex randering is also noticable in Table 1. Moreover, characters in the same position in the left hand side (Devanagari) and in the right hand side (Bengali) mostly share similar pronounciations. These
3 characteristics are also common for other Indian languages. (Consonant) * (Vowel) * ] type ligature, the second or higher order consonants also have a different priority level than that of their individual occurrences or their occurrence as the first consonants in a ligature. A sorting example is illustrated in Figure 2. Considering the special characteristics mentioned above, a mathematical model of processing Indian languages which requires a multi-byte encoding scheme is proposed in the next section. In this approach, all the possible glyphs (including all the ligatures, the vowels and the consonants) are encoded. Input method is also proposed to facilitate an efficient way of inputting text. Figure 1: Example Text Rendering of Indian Text Source: Unicode Although most of the Indian languages have a tiny alphabet set, in constituting a word, the glyphs of the letters may take several different forms and shapes depending on their places of occurance. Vowels usually change their shapes when appearing with a consonant or a ligature and this change is sometimes irregular. Consonants and vowels, and two or more consonants with or without vowels, may combine together to form ligatures and their combined forms may have a totally different look. Moreover, the sequence of letters does not always appears in a straight-forward order. A vowel pronounced after a consonant may appear on the top or at the bottom of that consonant, and it may even appear before the consonant. There are some cases, when one part of a single vowel appears before the consonant and the other part appears after the consonant. Formation of ligatures have many other irregularities. Randering is a complicated issue for Indian language processing. The complexity of randering is easily noticeable from Figure 1, where randering of the Devanagari script is shown. Indian languages also have a unique sorting mechanism. Unlike English, consonants and vowels have different priorities in sorting. Words are sorted by taking the consonant s order as the first consideration and then the associated vowel s order as the second consideration. In a [(Consonant) (Consonant) Figure 2: Example of Complex Sorting 3 Mathematical Model of Multi-byte Coding of Indian Languages In this section, we will introduce the mathematical model of Indian languages, taking Bengali as an example language. Other Sanskrit based Indian languages can also be modelled in the same way. In this model, we treat letters (vowels and consonants) in the Bengali alphabet as radicals or Jomo in CJK languages. Rather than encoding only the letters, we propose encoding all the letters as well as all the linguistically meaningful ligatures they may form.
4 Linguistic analysis is necessary to find only the potential ligatures since many possible ligatures are never used in reality. Finally, we treat these ligatures (along with the independent vowels and consonants) in the same way as the CJK characters (including radicals or Jomos) are treated in CJK systems. This is to process Indian languages in the same fashion as characters are processed in the CJK systems. 3.1 Basic Definitions Definition 1. A consonant in Bengali is represented as c i, i = 1 to 39. (There are 39 commonly used consonants in Bengali ). Constant Set, C = {c i } Definition 2. An independent vowel (dependent vowels are the symbolic variations of independent vowels usually appear with consonants/ligatures) is represented as v j, j = 1 to 11 (There are 11 commonly used vowels in Bengali). Vowel Set, V = {v j } Definition 3. The combinations of one consonant and one vowel; one consonant and one diacritical mark; two or more consonants (with or without vowel or diacritical mark) is called ligature and represented as l k, k = 1 to 2,500. (We analyzed the Bengali language and found that there are about 2,500 commonly used ligatures) Ligature Set, L = {l k } Definition 4. Including Bengali numerals, monetary and other symbols, etc., there are about 20 commonly used symbols in Bengali and they are represented as S l, l = 1 to 20. Symbol Set, S = {s l } Definition 5. Word Constituent Unit (u m ) is defined in the following way: u m c i v j l k, generally m 3000 for Indian languages Word Constituent Unit Set, U = {u m } Definition 6. Words are represented as w n, n = 1 to α virtually. w n = (u m ) + Word Set, W= {w n } Definition 7. We denote B as a set of Bengali characters as follows: Bengali Character Set, B = { b i b i U or b i S } 3.2 Mathematical Model of Multi-byte Code for Bengali language Definition 8. Each element in B is assigned a unique multi-byte (16 bit, in case of double-byte coding) internal code, i i. There exists a function σ, so that i i = σ(b i ) and b j = σ -1 (i i ). Internal Code Set, I = { i i i i = 16 for double-byte coding} If b i appears before b j in the dictionary, then corresponding i i and i j will satisfy i i < i j. Notice here that sorting of Bengali words can now be done simply by comparing the internal codes. 3.3 Mathematical Model of Bengali Character Input Like any CJK system, several input methods can be designed for Indian languages too. We designed an input method called IAYS (Input As You Spell, spelling is unambiguous for Indian languages), where the user is provided with a keyboard layout which includes only the vowels, consonants and symbols. To input a Bengali word constituent unit, users would type in the sequence of consonants and vowels as they spell the unit. For some cases, a selection option will appear in the prompt line for disambiguation. Apparently, it seems that selection key-strokes are an extra overhead. However, the input method's performance can be further optimized using associative rules and word-based or phrase-based input techniques. Word-based and phrasebased input methods give an amazingly high input rate as proven for the CJK systems. It is because the words and the phrases are less ambiguous than a single character, so the selection key is not necessary for most cases. Moreover, abbreviated input of words and phrases is also possible which leads to a high input rate. A simple input method uses the table lookup mechanism, where we have a table of Input Codes (spelling attributes for each word constituent unit) and their respective Internal Codes in each row. It can be noted here that inputting Indian languages is less ambiguous than their CJK counterparts due to the smaller mapping space. For example, Bengali input involves a mapping of 50 to 3,000, where in CJK systems, the mappings are usually 50 to more than 6,000. The following two definitions explain the input process mathematically.
5 Definition 9. Spelling Attribute Set. The spelling attributes of a word constituent unit consist of relevant vowels and consonants. Spelling Attribute Set, A = {a i a i C, a i V} Definition 10. Input Method is a one-to-many mapping, ρ from the spelling attribute set to the Bengali character set: b i = ρ (a j ). 3.4 Mathematical Model of Bengali Character Output Output process is a one-to-one mapping, θ, which maps the unique Internal Code Set into Fonts Attribute Set. The output mechanism is less complicated since it is a mapping between internal codes and font attributes, a one-to-one mapping. Definition 11. Fonts Attribute Set, F = { f i f i is a binary(0,1) sequence of 16*16, 24*24 bits, etc.} Definition 12. Output of a Bengali character is a one-to-one mapping, θ, where f i = θ(i j ), a mapping from internal code to the relevant font. 4 Experimentation and Validation Experimentation and validation of this approach is made by adding the 3,000 glyphs of Bengali word constituent units into the user defined space of the existing CJK systems and by appending the lookup table accordingly in the Front End Processing system. Bengali text processing has been made possible instantly and equally efficiently like CJK texts are processed in the original system. Moreover, our Bengali system instantly inherited all the other resources of the host CJK system. That is, all the available applications are instantly usable with Bengali language, too. Conclusion This is the very first implementation of an Indian language in the CJK text processing framework using multi-byte coding. Although the ligature analysis, font design, etc. are not so efficient and error-free for the time being, the approach we explained here focuses on a more computationally inclined way of processing Indian languages. English, the European languages, and the CJK languages have a long history of development on several platforms of computing (Hu et al., 1989). Processing Indian languages in CJK framework can equally provide instant inheritance of the research results accumulated for CJK languages over the past years. UNICODE advocates using the multi-byte code for every language. Therefore, acquiring extra codespaces for Indian languages to encode their character sets in the same manner as the CJK languages are encoded is technically feasible too. The mathematical model of the CJK system (Qian et al., 1992) is very similar to that of our Bengali system. Thus, it remains easier to port our Bengali system in other platforms where CJK languages have already been successfully implemented. Multilingual environment is assured in this way. Acknowledgement I want to thank Professor Yuji Matsumoto, my current supervisor in Nara Institute of Science and Technology, Japan for kindly reviewing and commenting on this work. Thanks are also due to Professor Mao Yu-Heng and Professor Dai Mei-Er of Tsinghua University, China for their encouragement and advice. Among other Bangladeshi fellows who helped in designing font, testing the prototype and commenting, I must specially acknowledge the contribution of Mohammed Kawser and Ashraful Huq from the initial stage of this research. References Hu Xian-Xiang et al Implementation of a Multilingual Computational Environment Based on X Window System. In Proceedings of Chinese Computing Conference '89, pp. 64 Ken Lunde CJKV Information Processing: Chinese, Japanese, Korean and Vietnamese computing. O'Reilly & Associates, Inc. Lin Yaw-Jen et al Conversion of METAFONT file to TRUETYPE. In Proceedings of International Conference of Chinese Computing, ICCC-94, pp Qian Pei-De et al CCDOS Technical Handbook Volume 1. Tsinghua University Press, Beijing, China (in Chinese). UNICODE Consortium The Unicode Standard 2.0. Addison Wesley. URL: Zhao Po-Zhang et al Chinese Information Processing Technique. Aeronautics and Aerospace Press, Beijing, China (in Chinese).
Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing
Introduction to Unicode By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode Unicode Why Unicode? What is Unicode? Unicode Architecture Why Unicode? Pre-Unicode Standards
More informationKeyboards for inputting Japanese language -A study based on US patents
Keyboards for inputting Japanese language -A study based on US patents Umakant Mishra Bangalore, India umakant@trizsite.tk http://umakant.trizsite.tk (This paper was published in April 2005 issue of TRIZsite
More informationThe Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationEURESCOM - P923 (Babelweb) PIR.3.1
Multilingual text processing difficulties Malek Boualem, Jérôme Vinesse CNET, 1. Introduction Users of more and more applications now require multilingual text processing tools, including word processors,
More informationRed Hat Enterprise Linux International Language Support Guide
Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Copyright This book is about international language support for Red Hat Enterprise
More informationChapter 4: Computer Codes
Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data
More informationThe use of binary codes to represent characters
The use of binary codes to represent characters Teacher s Notes Lesson Plan x Length 60 mins Specification Link 2.1.4/hi Character Learning objective (a) Explain the use of binary codes to represent characters
More informationEasy Bangla Typing for MS-Word!
Easy Bangla Typing for MS-Word! W ELCOME to Ekushey 2.2c, the easiest and most powerful Bangla typing software yet produced! Prepare yourself for international standard UNICODE Bangla typing. Fully integrated
More informationTibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA
Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows
More informationRendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt
Rendering/Layout Engine for Complex script Pema Geyleg pgeyleg@dit.gov.bt Overview What is the Layout Engine/ Rendering? What is complex text? Types of rendering engine? How does it work? How does it support
More informationBangla Text Input and Rendering Support for Short Message Service on Mobile Devices
Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices Tofazzal Rownok, Md. Zahurul Islam and Mumit Khan Department of Computer Science and Engineering, BRAC University, Dhaka,
More informationMulti-lingual Label Printing with Unicode
Multi-lingual Label Printing with Unicode White Paper Version 20100716 2009 SATO CORPORATION. All rights reserved. http://www.satoworldwide.com softwaresupport@satogbs.com 2009 SATO Corporation. All rights
More informationDRH specification framework
DRH specification framework 2007-03-15 EDM - NIED Takeshi KAWAMOTO, Hiroaki NEGISHI, Mitsuaki SASAKI 1 DRH Basic Development before Sep. 2007 Server architectures Search architectures Multilanguage Architectures
More informationEncoding script-specific writing rules based on the Unicode character set
Encoding script-specific writing rules based on the Unicode character set Malek Boualem, Mark Leisher, Bill Ogden Computing Research Laboratory (CRL), New Mexico State University, Box 30001, Dept 3CRL,
More informationInternationalization & Localization
Internationalization & Localization Of OpenOffice.org - The Indian Perspective Comprehensive Office Suite for Multilingual Indic Computing Bhupesh Koli, Shikha G Pillai
More informationKeywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3
Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 4 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
More informationKazuraki : Under The Hood
Kazuraki : Under The Hood Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated Why Develop Kazuraki? To build excitement and awareness about OpenType Japanese fonts Kazuraki is the first
More informationHP Business Notebook Password Localization Guidelines V1.0
HP Business Notebook Password Localization Guidelines V1.0 November 2009 Table of Contents: 1. Introduction..2 2. Supported Platforms...2 3. Overview of Design...3 4. Supported Keyboard Layouts in Preboot
More informationDesigning Global Applications: Requirements and Challenges
Designing Global Applications: Requirements and Challenges Sourav Mazumder Abstract This paper explores various business drivers for globalization and examines the nature of globalization requirements
More informationNational Language (Tamil) Support in Oracle An Oracle White paper / November 2004
National Language (Tamil) Support in Oracle An Oracle White paper / November 2004 Vasundhara V* & Nagarajan M & * vasundhara.venkatasubramanian@oracle.com; & Nagarajan.muthukrishnan@oracle.com) Oracle
More informationGuidelines for Writing System Support
2003-10-31 Page 1 of 80 Victor Gaultney (Editor), SIL Non-Roman Script Initiative (NRSI) 2003-10-31 Table of Contents Section 1 Components of a Writing System Implementation... 4 1.1 Writing system implementations...
More informationWhen older typesetting methods gave
Typographic Terms When older typesetting methods gave way to electronic publishing, certain traditional terms got carried along. Today we use a mix of old and new terminology to describe typography. Alignment
More informationSETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE
595 SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE Nagaraj N Vaidya Francis Jayakanth Abstract Today 80 % of the content on the Web is in English, which is spoken
More informationTEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
More informationThe Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationBangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh
Bangla Localization of OpenOffice.org Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Localization L10n is the process of adapting the text and applications of a product or service to
More informationencoding compression encryption
encoding compression encryption ASCII utf-8 utf-16 zip mpeg jpeg AES RSA diffie-hellman Expressing characters... ASCII and Unicode, conventions of how characters are expressed in bits. ASCII (7 bits) -
More informationVery often my clients ask me, Don I need Chinese translation. If I ask which Chinese? They just say Just Chinese. If I explain them there re more
Hello, fellow colleagues in Translation industry. And, Thank you very much for nice introduction. Vanessa. When you hear the topic Asian Languages and Markets, each of you probably had some questions or
More informationEncoding Text with a Small Alphabet
Chapter 2 Encoding Text with a Small Alphabet Given the nature of the Internet, we can break the process of understanding how information is transmitted into two components. First, we have to figure out
More informationInternational Language Character Code
, pp.161-166 http://dx.doi.org/10.14257/astl.2015.81.33 International Language Character Code with DNA Molecules Wei Wang, Zhengxu Zhao, Qian Xu School of Information Science and Technology, Shijiazhuang
More informationCounting in base 10, 2 and 16
Counting in base 10, 2 and 16 1. Binary Numbers A super-important fact: (Nearly all) Computers store all information in the form of binary numbers. Numbers, characters, images, music files --- all of these
More informationPreservation Handbook
Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc
More informationJapanese Character Printers EPL2 Programming Manual Addendum
Japanese Character Printers EPL2 Programming Manual Addendum This addendum contains information unique to Zebra Technologies Japanese character bar code printers. The Japanese configuration printers support
More information.ASIA CJK (Chinese Japanese Korean) IDN Policies
Date: Status: Version: 1.1.ASIA IDN Policies 04-May-2011 COMPLETE Archive URL: References: http://dot.asia/policies/dotasia-cjk-idn-policies-complete--2011-05-04.pdf.asia ZH / JA / KO IDN Language Tables
More informationOPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC
OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC We ll look at these questions. Why does translation cost so much? Why is it hard to keep content consistent? Why is it hard for an organization
More informationThe Virtual Tibetan Classroom
The Virtual Tibetan Classroom by William Magee, DDBC Thanks to a Generous Grant from the Taiwan National Science Council and the Hopkins MultimediaTibetan Research Archive Project http://haa.ddbc.edu.tw
More informationRadicals of Chinese Characters
Radicals of Chinese Characters In order to function in a CJK environment, one must first become comfortable with the concept of radicals ( 部 首 Ch. bùshǒu, J. bushu, K. bŭsu). What is a radical? Simply
More informationLocalization of Text Editor using Java Programming
Localization of Text Editor using Java Programming Varsha Tomar M.Tech Scholar Banasthali University Jaipur, India Manisha Bhatia Assistant Professor Banasthali University Jaipur, India ABSTRACT Software
More informationWIRELESS and mobile technology
Mobile Apps In Indian Languages WIRELESS and mobile technology is playing a profound role in networking and communications, even though wire-line technology, such as fiber links, has inherent capacity
More informationInternationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla
Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease
More informationThe Adobe PostScript Printing Primer
The Adobe PostScript Printing Primer A do be Post Script Contents Since Adobe Systems introduced the PostScript standard in 1985, Adobe PostScript software has sparked a revolution in how we communicate
More informationHow to translate your website. An overview of the steps to take if you are about to embark on a website localization project.
How to translate your website An overview of the steps to take if you are about to embark on a website localization project. Getting Started Translating websites can be an expensive and complex process.
More informationHOW TO USE A KANJI DICTIONARY
HOW TO USE A KANJI DICTIONARY As I write this, in 2008, I fear that the skills I m about to describe will soon be who am I kidding? already are endangered species. That is to say, the way to use a traditional
More informationInternationalized Domain Names -
Internationalized Domain Names - Getting them to work Gihan Dias LK Domain Registry What is IDN? Originally DNS names were restricted to the characters a-z (letters), 0-9 (digits) and '-' (hyphen) (LDH)
More informationWelcome to The Grid 2
Welcome to 1 Thanks for choosing! These training cards will help you learn about, providing step-by-step instructions for the key skills you will need and introducing the included resources. What does
More informationSAMPLE TURABIAN STYLE PAPER
SAMPLE TURABIAN STYLE PAPER John Doe History 2010 Dr. Johnson July 11, 2014 Doe 1 The Turabian style sample essay has been prepared to help answer some of the questions and problems that you may have when
More informationCreating trouble-free numbering in Microsoft Word
Creating trouble-free numbering in Microsoft Word This note shows you how to create trouble-free chapter, section and paragraph numbering, as well as bulleted and numbered lists that look the way you want
More informationI PUC - Computer Science. Practical s Syllabus. Contents
I PUC - Computer Science Practical s Syllabus Contents Topics 1 Overview Of a Computer 1.1 Introduction 1.2 Functional Components of a computer (Working of each unit) 1.3 Evolution Of Computers 1.4 Generations
More informationNumber Representation
Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data
More informationChapter 1. Binary, octal and hexadecimal numbers
Chapter 1. Binary, octal and hexadecimal numbers This material is covered in the books: Nelson Magor Cooke et al, Basic mathematics for electronics (7th edition), Glencoe, Lake Forest, Ill., 1992. [Hamilton
More informationDesign of Company s Salary Management System. Lei Xiong
3rd International Conference on Management, Education, Information and Control (MEICI 2015) Design of Company s Salary Management System Lei Xiong College of Information Engineering, Jiangxi University
More informationDemonstration of Barcodes to QR Codes through Text Using Document Software
Demonstration of Barcodes to QR Codes through Text Using Document Software Dr. Neeraj Bhargava 1, Anchal kumawat 2, Dr. Ritu Bhargava 3 Associate Professor, Department of Computer Science, School of Engineering
More informationThe Hexadecimal Number System and Memory Addressing
APPENDIX C The Hexadecimal Number System and Memory Addressing U nderstanding the number system and the coding system that computers use to store data and communicate with each other is fundamental to
More informationFreescale Embedded GUI Converter Utility 2.0 Quick User Guide
Freescale Semiconductor User Guide Document Number: EGUICUG Rev. 1, 08/2010 Freescale Embedded GUI Converter Utility 2.0 Quick User Guide 1 Introduction The Freescale Embedded GUI Converter Utility 2.0
More informationCHAPTER 2: HARDWARE BASICS: INSIDE THE BOX
CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX Multiple Choice: 1. Processing information involves: A. accepting information from the outside world. B. communication with another computer. C. performing arithmetic
More informationPRICE LIST. ALPHA TRANSLATION AGENCY www.biuro-tlumaczen.tv info@biuro-tlumaczen.tv
We encourage you to get to know the prices of the services provided by Alpha Translation Agency in the range of standard and certified written translations of common and rare languages, as well as interpretation
More informationBCSD WebMail Documentation
BCSD WebMail Documentation Outlook Web Access is available to all BCSD account holders! Outlook Web Access provides Webbased access to your e-mail, your calendar, your contacts, and the global address
More informationDraft WGIG Issue Paper on the Multilingualization of
Draft WGIG Issue Paper on the Multilingualization of Internet Naming System This paper is a 'draft working paper' reflecting the preliminary findings of the drafting team. It has been subject to review
More informationBeginning Microsoft Word XP
Beginning Microsoft Word XP Objective 1: Become acquainted with the Microsoft Word XP environment. Toolbars Standard Toolbar Formatting Toolbar Toolbars provide easy access to commonly used options. The
More informationChapter 2 Text Processing with the Command Line Interface
Chapter 2 Text Processing with the Command Line Interface Abstract This chapter aims to help demystify the command line interface that is commonly used in UNIX and UNIX-like systems such as Linux and Mac
More informationThe Indian National Bibliography: Today and tomorrow
Submitted on: June 22, 2013 The Indian National Bibliography: Today and tomorrow Shahina P. Ahas Central Reference Library, Kolkata, India E-mail : shahinaprashob@gmail.com Swapna Banerjee Department of
More informationComputer Basics: Chapters 1 & 2
Computer Basics: Chapters 1 & 2 Definition of a Computer What does IPOS stand for? Input Process Output Storage Other types of Computers Name some examples of other types of computers, other than a typical
More informationELFRING FONTS UPC BAR CODES
ELFRING FONTS UPC BAR CODES This package includes five UPC-A and five UPC-E bar code fonts in both TrueType and PostScript formats, a Windows utility, BarUPC, which helps you make bar codes, and Visual
More informationAccordIt User s Guide
Version 2 Contents: Using AccordIt AccordIt Menu Items Greek and Hebrew Clipboard Options For More Information Introduction AccordIt is a text processing utility intended for use with Accordance Bible
More informationWriting Reports BJECTIVES ONTENTS. By the end of this section you should be able to :
Writing Reports By the end of this section you should be able to : O BJECTIVES Understand the purposes of a report Plan a report Understand the structure of a report Collect information for your report
More informationTyping Devanagari on Mac OS X compiled by José C. Rodriguez, Emory College Language Center, Emory University 2009
Typing in the Devanagari script on Mac OS X can be done with either the Devanagari-QWERTY keyboard or standard Devanagari keyboard layouts. These are provided free with the Mac OS, but must be installed
More information4.3 TABLE 3 TABLE 4. 1342 five 1 125 3 25 4 5 2 1 125 75 20 2 222.
.3 Conversion Between Number Bases 169.3 Conversion Between Number Bases Although the numeration systems discussed in the opening section were all base ten, other bases have occurred historically. For
More informationPreservation Handbook
Preservation Handbook [Binary Text / Word Processor Documents] Author Rowan Wilson and Martin Wynne Version Draft V3 Date 22 / 08 / 05 Change History Revised by MW 22.8.05; 2.12.05; 7.3.06 Page 1 of 7
More informationL2/14-009 Abstract Introduction
P P T 0 1 S P P P P P P S P P P P P 0 S 1 1 S 0 0 1 P 0 S 1 T P 0 S 1 T 1 T P 0 S 1 T P 0 T P P P 0 1 S S 1 0 T P S P 1 0 T S P 0 1 P 0 S 1 T TPPT Form for PT ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY
More information2011, The McGraw-Hill Companies, Inc. Chapter 3
Chapter 3 3.1 Decimal System The radix or base of a number system determines the total number of different symbols or digits used by that system. The decimal system has a base of 10 with the digits 0 through
More informationPoints of Interference in Learning English as a Second Language
Points of Interference in Learning English as a Second Language Tone Spanish: In both English and Spanish there are four tone levels, but Spanish speaker use only the three lower pitch tones, except when
More informationGaiji: Characters, Glyphs, Both, or Neither?
Gaiji: Characters, Glyphs, Both, or Neither? A Graphics and Publishing Industry View Jim DeLaHunt Type Development Group, Adobe Systems Incorporated 1 Abstract Unicode encodes Han characters by the tens
More informationFUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2
FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2 MARK SCHEME Instructions to marker There are 30 marks available for each of the three tasks, which should be marked separately, resulting in a total of 90 marks.
More informationIntroduction to Internationalized Domain Names (IDN)
Introduction to ized Domain Names (IDN) IP Symposium for CEE, CIS and Baltic States Moscow, Russia 16-19 September 2003 Robert Shaw ITU Internet Strategy and Policy Advisor Agenda
More informationDocument Conventions... 2 Technical Requirements... 2. Logging On... 3 Logging Off... 3. Main Menu Panel... 4 Contents Panel... 4 Document Panel...
Contents GETTING STARTED... 2 Document Conventions... 2 Technical Requirements... 2 LOGIN AND LOGOFF... 2 Logging On... 3 Logging Off... 3 USP-NF ONLINE HOME PAGE... 3 Main Menu Panel... 4 Contents Panel...
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationTable Of Contents. iii
PASSOLO Handbook Table Of Contents General... 1 Content Overview... 1 Typographic Conventions... 2 First Steps... 3 First steps... 3 The Welcome dialog... 3 User login... 4 PASSOLO Projects... 5 Overview...
More informationA Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts
[Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational
More informationUSING MICROSOFT WORD 2008(MAC) FOR APA TASKS
USING MICROSOFT WORD 2008(MAC) FOR APA TASKS MS WORD 2008(MAC), GENERAL TIPS Backspace and Delete The keyboard has two delete keys: Backspace and Delete. What s the difference? The Backspace key deletes
More informationBinary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.
Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must be able to handle more than just values for real world problems
More informationBinary Representation
Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must tbe able to handle more than just values for real world problems
More informationOnline Bulletin Boards An Introduction
Online Bulletin Boards An Introduction Online Bulletin Boards are Internet-based, virtual venues where participants gather and engage in interactive, text-based discussions lead by moderators. Group Works
More informationSection 1.4 Place Value Systems of Numeration in Other Bases
Section.4 Place Value Systems of Numeration in Other Bases Other Bases The Hindu-Arabic system that is used in most of the world today is a positional value system with a base of ten. The simplest reason
More informationVoluntary Product Accessibility Template Blackboard Learn Release 9.1 April 2014 (Published April 30, 2014)
Voluntary Product Accessibility Template Blackboard Learn Release 9.1 April 2014 (Published April 30, 2014) Contents: Introduction Key Improvements VPAT Section 1194.21: Software Applications and Operating
More informationUNIVERSITY OF MYSORE B Com. ( ANNUAL ) DEGREE EXAMINATIONS - MAY / JUNE 2014 TIME TABLE
02/06/2014 11002 ENGLISH 31201 BUSINESS LEGISLATION MONDAY (Common to 99 Sch. & equivalent paper to Business Laws of 93 Sch.) 03/06/2014 31102 ENGLISH 31202 BUSINESS STATISTICS TUESDAY 04/06/2014 11013
More informationUser Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02
Printing Unicode characters from SAP to SATO GT4xxe Printers User Guide Version 061030-02 2006 SATO Corporation. All rights reserved. Table of Contents 1. Introduction... 3 2. Configuration at SAP environment...
More informationVisualizing Keyboard Pattern Passwords
Visualizing Keyboard Pattern Passwords Dino Schweitzer, Jeff Boleng, Colin Hughes, Louis Murphy United States Air Force Academy ABSTRACT Passwords are a fundamental security vulnerability in many systems.
More informationMLA Formatting in Microsoft Word 2010/2011
MLA Formatting in Microsoft Word 2010/2011 Learn to format a research paper in MLA style using Microsoft Word 2010 for Windows and 2011 for Mac. Program Version and Resources for Guide All the recommended
More informationTutorial Microsoft Office Excel 2003
Tutorial Microsoft Office Excel 2003 Introduction: Microsoft Excel is the most widespread program for creating spreadsheets on the market today. Spreadsheets allow you to organize information in rows and
More informationBusiness Portal for Microsoft Dynamics GP 2010. User s Guide Release 5.1
Business Portal for Microsoft Dynamics GP 2010 User s Guide Release 5.1 Copyright Copyright 2011 Microsoft. All rights reserved. Limitation of liability This document is provided as-is. Information and
More informationHKSCS-2004 Support for Windows Platform
HKSCS-2004 Support for Windows Platform Windows XP Font Pack for ISO 10646:2003 + Amendment 1 Traditional Chinese Support (HKSCS-2004) update for Windows XP and Windows Server 2003 June 2010 Version 1.0
More informationWhat is Microsoft PowerPoint?
What is Microsoft PowerPoint? Microsoft PowerPoint is a powerful presentation builder. In PowerPoint, you can create slides for a slide-show with dynamic effects that will keep any audience s attention.
More informationLuitPad: A fully Unicode compatible Assamese writing software
LuitPad: A fully Unicode compatible Assamese writing software Navanath Saharia 1,3 Kishori M Konwar 2,3 (1) Tezpur University, Tezpur, Assam, India (2) University of British Columbia, Vancouver, Canada
More informationMicrosoft Outlook Introduction
Microsoft Outlook Introduction Division of Information Technology February 2016 Contents Document Management History... 3 Introduction... 4 Getting Started... 4 Using MS Outlook... 4 What MS Outlook looks
More informationResearch on Applying Web3D Technology to College Library Instruction of Online Book Navigation System. Wang Shuo, Mu Dawei, Zhao Jinlong, Hu Xiaoli
RESEARCH ON APPLYING WEB3D TECHNOLOGY TO COLLEGE LIBRARY INSTRUCTION OF ONLINE 3D BOOK NAVIGATION SYSTEM Wang Shuo, Mu Dawei, Zhao Jinlong, Hu Xiaoli (Library of Capital Normal University, Beijing, China,
More informationThe Language Exchange, Inc. GSA Language Services Catalog
The Language Exchange, Inc. GSA Language Services Catalog Federal Supply Service Authorized Federal Supply Schedule List Schedule Title: Language Services Federal Supply Group: 738 Classes R499 Contract
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationIBM Emulation Mode Printer Commands
IBM Emulation Mode Printer Commands Section 3 This section provides a detailed description of IBM emulation mode commands you can use with your printer. Control Codes Control codes are one-character printer
More informationThe Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized
The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized to each field. The Language Grid, a software that provides
More information