Collation Weight Design for Myanmar Unicode Texts
|
|
- Alexander Cook
- 1 years ago
- Views:
Transcription
1 Collation Weight Design for Myanmar Unicode Texts Tin Htay Hlaing Nagaoka University of Technology Nagaoka, JAPAN. Yoshiki Mikami Nagaoka University of Technology Nagaoka, JAPAN. Abstract This paper proposes collation weights for sorting Myanmar Unicode texts in conformity with Unicode Collation Algorithm. Our proposal is constructed based on the sorting order given in Myanmar Spelling Book which is also known as Myanmar Orthography. Myanmar language has complex linguistic features and needs preprocessing of texts before collation. Thus, we examine syllable structure, standard collation orders of Myanmar characters and then some preprocessing steps in detail. Furthermore, we introduced mathematical definitions such as Complete Order Set (COSET) and Product of Order to describe Myanmar sorting order in a formal way. This formal description gives clear and unambiguous understanding of Myanmar lexicographic order to foreigners as well as natives. 1 Aim of Research Myanmar Language is the official language of Myanmar spoken by 32 million people as first language and as second language by some ethnic minorities of Myanmar. However, awareness of Myanmar lexicographic sorting or collation order is lacking among the public. In Myanmar, people rarely look up a word in dictionary using Myanmar lexicographic order because most of the dictionaries published in Myanmar are English-to-Myanmar dictionaries. For example, Engineering dictionary, Medical dictionary and Computer dictionary use English lexicographic order. Likewise, in telephone directory and yellow pages in Web, the Myanmar names are Romanized and sorted according to the conventional Latin alphabet order. Moreover, Microsoft`s window operating system and office applications for desktop publishing are commonly used in Myanmar. But, until now, Microsoft applications such as Word and Excel simply sorts Myanmar character based on binary code values and it does not meet Myanmar traditional sorting requirements. Only a few professional linguists compile dictionaries but collation rules used by them are not understood easily by others. Lack of understanding of collation order in turn makes importance of collation unaware. Consequently, these factors mentioned above act as a barrier to the implementations of Myanmar Language in database managements system and other ICT applications. Therefore, we design collation weights for Myanmar Unicode texts and employed these collation weights in implementation of lexicographic sorting algorithm for Myanmar texts. Our algorithm is tested to sort the words using Myanmar Spelling Book order or Myanmar Orthography (Myanmar Language Commission, 2006) and proved its proper working. Also, we propose a formal description method for collation order using concepts of ordered set so that Myanmar sorting rules are well understood by both foreigners and natives. 2 Formal Description of Sorting Order Standard sorting rules are often defined by national level language committee or are given in arrangement of words in the dictionaries. These rules are, however, complex, and difficult to understand well for those who are foreign to that language. Therefore, we introduced a formal description method for collation orders of syllabic scripts by employing a concept of Complete Order Set, COSET. 2.1 Definitions Definition 1 Complete Order Set In Set Theory, it is defined that a complete order is a binary relation (here denoted by infix ) on a set X. The relation is transitive, antisymmetric and total. A set equipped with a complete order is called a complete ordered set. If X is ordered 1 Conference on Human Language Technology for Development, Alexandria, Egypt, 2-5 May 2011.
2 under, then the following statements hold for all a, b and c in X: (Transitivity) if x y and y z then x z (Antisymmetry) if x y and y x then x = y (Totality) x y or y x ( Moll and et al, 1988) Definition 2 Product of Order Let X and Y be COSETs and x i X, y i Y. Then, the product of order (X, ) (Y, ) is defined as x 1 y 1<x 2 y 2 x 1 =x 2 and y 1 <y2 or x 1 y 1<x 2 y 2 x 1 x 2 As seen from above definition, product of order is not commutative. Definition 3 Lexicographic Order Let (X, ) be a COSET, and X* be the set of all finite-length strings over X, then a lexicographic order on X*, (X*, LEX ) is defined: For all u, v X*, u = u 1 u n and v = v 1 v m, where m, n 0 and u j, v j X, we set u LEX v if and only if: (1) u i = v i for i = 1,..., m and m n; or (2) there is j 1 such that u i = v i for i = 1,...,j 1 and u j v j. Suppose M be the maximum length of strings in X*, and ø be the null element of the alphabet which always comes before letter, then fixed length representation of a string u is given by u = u 1 u 2..u n ø ø where ø occurs M-n times. Then above defined lexicographic order over X M can be written by using product of order ( Mikami and et al, 2010). (X M, LEX ) = (X, ).. (X, ) = (X, ) M 2.2 Multilevel Sorting Sorting that conforms to linguistic requirements of a particular language cannot be done by a simple comparison of code points and a single level of comparison. Multilevel sorting has become necessary to get the results that satisfy with users` expectations. Let`s think about English lexicographic sorting. The primary level comparison is made based on case-insensitive comparison of alphabet `a` which will be treated as the same order with `A`. Then the secondary level comparison is made based on the case of each letter. In order to describe above lexicographic order, we introduced two COSETs, (L, ) and (C, ) where (L, ) is a standard alphabet letter order a b.. z and (C, ) is a case order s c (means small letter comes before capital letter). If a null element is ø is added to both, two CO- SETs finally become: (L, ) = ({ø,a,..,z}, ø a z) (C, ) = ({ø,s,c}, ø s c) Then case-sensitive lexicographic order (X M, LEX ) on length M string of x X can be formally described as (X M, LEX ) = (L, ) M (C, ) M 3 Myanmar Character Set In this section, the lexicographic order of Myanmar characters is described using complete order set notation. Myanmar texts flow from left to right and no space between words or syllables. The categories of Myanmar characters defined in Unicode and in our proposal are shown in Table 1. Basically, Myanmar script has 33 consonants, 8 vowels, 2 diacritics, 11 medials,, 10 digits and 2 punctuation marks. Every character group is non-ignorable in sorting process except punctuation marks and some special characters. We describe Myanmar characters using set notation as Myanmar Character Set= {C, V, I, D, M, N, F,,,,, }. Unicode Category with number of elements Proposed Category with abbreviation Consonants (33) Consonants( C ) Dependent (8) and Vowels (V, I ) Independent vowels (8) Various Signs (5) Consonant Conjuncts (4) Digits (10) Punctuations (2) Diacritics(D) and Medials (M) Numerals (N) Punctuations + Consonants (F) Table 1. Categories of Myanmar Characters. 3.1 Consonants Myanmar consonant has default vowel sound and itself works as a syllable. The set of consonants is C={က, ခ, ဂ, ဃ, င, စ, ဆ, ဇ, ဈ, ဉ, ည,ဋ,ဌ, 2
3 ဍ, ဎ, ဏ, တ, ထ,ဒ,ဓ,န,ပ,ဖ, ဗ, ဘ,မ,ယ,ရ, လ, ၀, သ, ဟ, ဠ, အ}becomes a COSET if normal consonant order is given. The order defined on C is written as (C,<). 3.2 Vowels The set of Myanmar dependent vowel characters in Unicode contains 8 elements {,,,,,,, }. But some of them are used only in combined form such as အ + အ or အ အ or အ အ and some characters which should be contracted before sorting such as အ + အ. So, the vowel order is defined only on modified vowel set V` composed of 12 elements {အ, အ, အ, အ, အ, အ, အ, အ, အ, အ, အ, အ }which it is written as (V`,<). The vowel order is also not defined on independent vowel set I={အ, ဣ, ဤ, ဥ, ဦ, ဧ, ဩ, ဪ} because one letter ဦ needs to be normalized to arrange it in a complete order. Thus, vowel order is defined only on modified independent vowel set (I`, <). (V`,<) and (I`,<) are isomorphic. 3.3 Medials There are four basic medials characters M={,,, }which combine and produce total 11 medials. The set of those 11 medials have an order (M`, <). 3.4 Finals or Ending Consonants The set of finals F={က,ခ,ဂ, ဃ,င,စ,ဆ,ဇ,ဈ,ည, ဋ,ဌ,ဍ, ဎ,ဏ,တ,ထ,ဒ,ဓ,န,ပ,ဖ,ဗ,ဘ,မ,ယ,ရ, လ,,သ,ဟ,ဠ }having an order (F,<) is isomorphic to (C,<). 3.5 Diacritics Diacritics alter the vowel sounds of accompanying consonants and they are used to indicate tone level. There are 2 diacritical marks {, } in Myanmar script and their order is (D,<). 3.6 Digits Myanmar Language has 10 numerals N={ ၀, ၁, ၂, ၃, ၄, ၅, ၆, ၇, ၈, ၉}with the order (N,<). 3.7 Myanmar Sign When there is a consonant at the end of a syllable, it carries a visible mark call ( ) to indicate that the inherent vowel is killed. It usually comes with consonants but sometimes comes with independent vowels. 4 Myanmar Syllable Structure in BNF Most European languages have orders defined on letters, but those languages which use syllabic scripts, including Myanmar, have an order defined on a set of syllables. In Myanmar Language, syllable components such consonants(c), independent vowel(i) and digits(n) are standalone components while dependent vowels(v), medials(m), diacritics(d) and finals(f) are not. Among them, consonants can also act as nucleus of syllable so that other characters can attach to it in different combinations. The structure of Myanmar Syllable can be illustrated using BNF notation as S:= C{M}{V}[F][D] I[F] N where { } means zero or more occurrences and [ ] means zero or one occurrence. 5 Comparison of Multilevel sorting in English and Myanmar In English, the sorting process does on word level. After completing primary level comparison for a given word, the secondary level comparison is started for this word again. For Myanmar Language, the lexicographic order is defined by the product of five generic components, namely, consonant order (C,<), medial order (M,<), final order (F,<), vowel order (V,<) and diacritics order (D,<). Therefore Myanmar syllable order (S,<) is given by formula (S,<) M = (C,<) (M,<) (F,<) (V,<) (D,<) Interesting to note here is the fact that the order (F,<) is considered before (V,<) while F comes after V in a coded string. Thus, Myanmar sorting process does on syllable-wise behavior. For instance, a given word may contain one or more syllables. Sorting is done on first syllable and if there is no difference, the process will move to next syllable. To sum up, firstly, one sort key is generated for one word in English while it is generated for each syllable in Myanmar. Secondly, multilevel sorting is done on word level in English but it is done within a syllable in Myanmar. Thirdly, in English, it needs whole word information because sorting process goes until it reaches end of word and then goes to next level. In contrast, 3
4 Myanmar sorting process goes until it reaches to last comparison level for one syllable and then it moves to next syllable. It means that Myanmar does not need whole word information if there is a difference before the final syllable. 6 Preprocessing of Texts 6.1 Syllabification Myanmar is syllable based language and thus syllabification is necessary before collation. This process needs to return not only syllable boundary but also the type of each component within a syllable. Maung and Mikami (2008) showed syllable breaking algorithm with a complete set of syllabification rules. 6.2 Reordering If Unicode encoding is followed, Myanmar characters are not stored in visual order. Myanmar vowels and consonant conjuncts are traditionally being typed in front of the consonants but we store Myanmar syllable components according to this order: <consonant> <medial> <vowel> <ending-consonant> <diacritic>. Therefore, no reordering is required for Myanmar Unicode texts. 6.3 Normalization Normalization is required if a letter or ligature is encoded in composed form as well as decomposed form. One Myanmar character has multiple representations and thus normalization is required. Decomposed Form Unicode for Decomposed forms Equivalent Composed form Unicode for Composed Form ဉ E ဉ 1026 Table 2. Normalization of a Vowel. 6.4 Contractions For some Myanmar dependent vowels, and medials, two or more characters clump together to form linguistic unit which has its own identity for collation. This group is treated similarly as a single character. These units may not be directly encoded in Unicode but are required to be created from their constituent units which are encoded. This process is called contraction (Hussain and Darrani, 2004). Glyph Unicode for Description Contraction C Vowel sign E + AA C+103A Vowel sign E+AA D + 102F Vowel sign I + UU Table 3. Vowel Contractions. Glyph Unicode for Contraction Description + 103B + 103D Consonant Sign Medial YA + WA 103C + 103D Consonant Sign Medial RA + WA + 103B + 103E Consonant Sign Medial YA + HA 103C + 103E Consonant Sign Medial RA + HA 103D + 103E Consonant Sign Medial WA + HA + 103B + 103D + Consonant Sign + 103E Medial YA+WA + HA + 103C + 103D + 103E Consonant Sign Medial YA+WA + HA Table 4. Consonant Conjuncts Contractions. Similarly, we need contractions for ending consonants or finals which is a combination of consonants and Myanmar Sign. Some of them are shown in table below. Glyph Unicode for Description Contraction က A Letter KA + ခ A Letter KHA+ ဂ A Letter GA + Table 5. Ending Consonant Contractions. 4
5 7 Myanmar Collation Myanmar Language presents some challenging scenarios for collation. Unicode Collation Algorithm (Davis and Whistler, 2010) is to be modified for Myanmar Collation. Because, in UCA, only one final sort key is generated for one word. But, one Myanmar word is divided into a sequence of syllables and sort key is generated at the syllable level. We use five levels of collation with consonants getting the primary weights and conjunct consonants having the secondary weights. At the tertiary and quaternary levels, ending consonants and vowels will be sorted respectively. Finally, the quinternary level is used to sort diacritical marks. We ignore digits intentionally as there is no word with digits in dictionary. The collation levels and their respective weight ranges are depicted in table below. Level Components Range Primary Consonants 02A1..02C2 Secondary Consonant 005A Conjuncts Tertiary Ending Consonants Quaternary Vowel B Quinternary Diacritics D Table 6. Collation Levels and Weights Range for Myanmar. 7.1 Collation Element Table for Myanmar Some of the Unicode collation elements for Myanmar Language are given in Appendix A. 8 Conclusion This paper proposes syllable-based multilevel collation for Myanmar Language. We also aimed to show how Unicode Collation Algorithm is employed for Myanmar words. We tested our algorithm to sort the words using Myanmar Orthography or Spelling Book Order and found that it works. But we have to do some more tests to handle loan syllable, kinzi, great SA and chained syllable so that we can produce more reliable evaluation. Myanmar language has some traditional writing styles such as consonant stacking eg. ဗ ဒ ဓ (Buddha), မန တလ (Mandalay, second capital of Myanmar), consonant repetition eg. (University), kinzi eg. အင ဂ (Cement), loan words eg. ဘတ (စ ) (bus). Although we write using above traditional styles, we read them in a simple way, for example, will be read as + + by inserting invisible Virama sign automatically. Therefore, syllabification process has to provide syllable boundary according to the way we read. If syllable breaking function does not work well, it may affect our result. References Mark Davis and Ken Whistler Unicode Collation Algorithm, Version Myanmar Language Commission Myanmar Orthography, Third Edition, University Press, Yangon, Myanmar. Robert M. Moll, Michael A. Arbib, and A.J. Kfoury An Introduction to Formal Language Theory, Springer-Verlag, New York, USA. Sarmad Hussain and Nadir Darrani A Study on Collation of Languages from Developing Asia, National University of Computer and Emerging Sciences, Lahore, Parkistan. Yoshiki Mikami, Shigeaki Kodama, and Wunna Ko Ko A proposal for formal description method of Collating order, Workshop on NLP for Asian Languages, Tokyo.1-8. Zin Maung Maung and Yoshiki Mikami A Rule-based Syllable Segmentation of Myanmar Texts. Proceedings of the IJCNLP Workshop on NLP for Less Privileged Languages, Hyderabad, India,
6 Appendix A. <Myanmar Collation Element Table> Glyph Unicode Collation Elements Unicode Name <- Consonants -> က A1 005A MYANMAR LETTER KA ခ A2 005A MYANMAR LETTER KHA ဂ A3 005A MYANMAR LETTER GA..... အ C2 005A MYANMAR LETTER A <- Consonant Conjunct Signs or Medials -> 103B B MYANMAR CONSONANT SIGN MEDIAL YA 103C C MYANMAR CONSONANT SIGN MEDIAL RA 103D D MYANMAR CONSONANT SIGN MEDIAL WA B103D103E MYANMAR CONSONANT SIGN MEDIAL YA+WA+HA 103C103D103E MYANMAR CONSONANT SIGN MEDIAL RA+WA+HA <-Independent Vowels -> ဢ MYANMAR LETTER I ဣ MYANMAR LETTER II ဤ MYANMAR LETTER U ဤ E MYANMAR LETTER U + MYANMAR LETTER II ဥ MYANMAR LETTER UU ဪ 102A MYANMAR LETTER AU <- Dependent Vowels -> 102B MYANMAR VOWEL SIGN TALL AA 102C MYANMAR VOWEL SIGN AA 102D MYANMAR VOWEL SIGN I C MYANMAR VOWEL SIGN E + AA C103A MYANMAR VOWEL SING E+AA A 0001 MYANMAR SIGN ANUSVARA 102D102F B 0001 MYANMAR VOWEL SING I + U <- Diacritics -> C MYANMAR SIGN DOT BELOW D MYANMAR SIGN VISARGA <- Ending Consonants or Finals -> က A MYANMAR LETTER KA + MYANMAR SIGN အ A MYANMAR LETTER A + MYANMAR SIGN 6
Photocopied photographs are NOT accepted.
EMBASSY OF THE REPUBLIC OF THE UNION OF MYANMAR 2300, " S" ST. N.W., WASHINGTON D.C. 20008-4089 Tel. (202) 332 4352, (202) 238 9332 Fax.(202) 332 4351 ( http://mewashingtondc.com, e-mail : mewdcusa@yahoo.com)
Photocopied photographs are NOT accepted.
"MEDITATION" VISA REQUIREMENT EMBASSY OF THE REPUBLIC OF THE UNION OF MYANMAR 2300, " S" ST. N.W., WASHINGTON D.C. 20008-4089 Tel. (202) 332 4352, (202) 238 9332 Fax.(202) 332 4351 ( http://mewashingtondc.com,
Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt
Rendering/Layout Engine for Complex script Pema Geyleg pgeyleg@dit.gov.bt Overview What is the Layout Engine/ Rendering? What is complex text? Types of rendering engine? How does it work? How does it support
Chapter 4: Computer Codes
Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data
The Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
Keyboards for inputting Japanese language -A study based on US patents
Keyboards for inputting Japanese language -A study based on US patents Umakant Mishra Bangalore, India umakant@trizsite.tk http://umakant.trizsite.tk (This paper was published in April 2005 issue of TRIZsite
Encoding script-specific writing rules based on the Unicode character set
Encoding script-specific writing rules based on the Unicode character set Malek Boualem, Mark Leisher, Bill Ogden Computing Research Laboratory (CRL), New Mexico State University, Box 30001, Dept 3CRL,
A simple approach for building transliteration editors for Indian languages
1354 Prahallad et al. / J Zhejiang Univ SCI 2005 6A(11):1354-1361 Journal of Zhejiang University SCIENCE ISSN 1009-3095 http://www.zju.edu.cn/jzus E-mail: jzus@zju.edu.cn A simple approach for building
MPCS: Myanmar Preposition Checking System
MPCS: Myanmar Preposition Checking System Khaing Htet Win University of Computer Studies, Yangon ABSTRACT For every language, preposition checker is essential component of many of Office Automation System
A) the use of different pens for writing B) learning to write with a pen C) the techniques of writing with the hand using a writing instrument
Level A 1. Your name written in your usual handwriting is called your: A) guarantee B) signature C) handwriting 2. Penmanship is A) the use of different pens for writing B) learning to write with a pen
Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA
Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows
INTERNATIONALIZED DOMAIN NAMES
Draft Policy Document for INTERNATIONALIZED DOMAIN NAMES Language: TAMIL 1 VERSION NUMBER DATE RECORD OF CHANGES PAGES AFFECTED 1.0 19/11/09 Whole Document 1.1 22/11/20 10 1.2 05/08/20 13 M Page No 8,
Preservation Handbook
Preservation Handbook [Binary Text / Word Processor Documents] Author Rowan Wilson and Martin Wynne Version Draft V3 Date 22 / 08 / 05 Change History Revised by MW 22.8.05; 2.12.05; 7.3.06 Page 1 of 7
Approaches to Arabic Name Transliteration and Matching in the DataFlux Quality Knowledge Base
32 Approaches to Arabic Name Transliteration and Matching in the DataFlux Quality Knowledge Base Brant N. Kay Brian C. Rineer SAS Institute Inc. SAS Institute Inc. 100 SAS Campus Drive 100 SAS Campus Drive
Kindergarten Common Core State Standards: English Language Arts
Kindergarten Common Core State Standards: English Language Arts Reading: Foundational Print Concepts RF.K.1. Demonstrate understanding of the organization and basic features of print. o Follow words from
Tamil Indic Input 2 User Guide
Tamil Indic Input 2 User Guide Tamil Indic Input 2 User Guide 1 Contents WHAT IS TAMIL INDIC INPUT 2?... 2 SYSTEM REQUIREMENTS... 2 TO INSTALL TAMIL INDIC INPUT 2... 2 TO USE TAMIL INDIC INPUT 2... 3 SUPPORTED
Using International Characters in BarTender
Using International Characters in BarTender How to Read Data and Print Characters from almost every Language and Writing System in the World WHITE PAPER Contents Overview 3 BarTender's Unicode Support
Activity 1: Bits and Bytes
ICS3U (Java): Introduction to Computer Science, Grade 11, University Preparation Activity 1: Bits and Bytes The Binary Number System Computers use electrical circuits that include many transistors and
Gujarati Indic Input 3 - User Guide
Gujarati Indic Input 3 - User Guide Contents 1. WHAT IS GUJARATI INDIC INPUT 3?... 2 1.1. SYSTEM REQUIREMENTS... 2 1.2. APPLICATION REQUIREMENTS... 2 2. TO INSTALL GUJARATI INDIC INPUT 3... 2 3. TO USE
FSD Kindergarten READING
College and Career Readiness Anchor Standards for Reading Read closely to determine what the text says explicitly and to make logical inferences from it; cite specific textual evidence when writing or
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Formal Languages and Automata Theory - Regular Expressions and Finite Automata -
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
Language and Literacy
Language and Literacy In the sections below is a summary of the alignment of the preschool learning foundations with (a) the infant/toddler learning and development foundations, (b) the common core state
ELFRING FONTS UPC BAR CODES
ELFRING FONTS UPC BAR CODES This package includes five UPC-A and five UPC-E bar code fonts in both TrueType and PostScript formats, a Windows utility, BarUPC, which helps you make bar codes, and Visual
Regular Expressions and Automata using Haskell
Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions
A Guide to Printing the 4-State Barcode
A Guide to Printing the 4-State Barcode June 1998 Contents Revised 16 Mar 2012 Introduction 2 Australia Post Overview 2 Prerequisites to printing the barcode 3 Appending the DPID 3 Validation of Address
The Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
Gujarati Indic Input 2 - User Guide
Gujarati Indic Input 2 - User Guide Gujarati Indic Input 2 - User Guide 2 Contents WHAT IS GUJARATI INDIC INPUT 2?... 3 SYSTEM REQUIREMENTS... 3 TO INSTALL GUJARATI INDIC INPUT 2... 3 TO USE GUJARATI INDIC
Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla
Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease
Reading 13 : Finite State Automata and Regular Expressions
CS/Math 24: Introduction to Discrete Mathematics Fall 25 Reading 3 : Finite State Automata and Regular Expressions Instructors: Beck Hasti, Gautam Prakriya In this reading we study a mathematical model
TEXT SEARCH OPTIONS. Text Search
Text Search There are two main options in Text Search: Simple (search for one string/word or a phrase) and Proximity (search for one or more words in proximity). TEXT OPTIONS Simple or Proximity There
ISO/IEC JTC1 SC2/WG2 N4399
ISO/IEC JTC1 SC2/WG2 N4399 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de rmalisation Международная организация по стандартизации
Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details
Strand: Reading Literature Key Ideas and Craft and Structure Integration of Knowledge and Ideas RL.K.1. With prompting and support, ask and answer questions about key details in a text RL.K.2. With prompting
Linear Algebra: Vectors
A Linear Algebra: Vectors A Appendix A: LINEAR ALGEBRA: VECTORS TABLE OF CONTENTS Page A Motivation A 3 A2 Vectors A 3 A2 Notational Conventions A 4 A22 Visualization A 5 A23 Special Vectors A 5 A3 Vector
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
Iteration, Induction, and Recursion
CHAPTER 2 Iteration, Induction, and Recursion The power of computers comes from their ability to execute the same task, or different versions of the same task, repeatedly. In computing, the theme of iteration
Proposal for encoding the combining diacritic arabic wasla.
Proposal for encoding the combining diacritic arabic wasla Miikka-Markus Alhonen May 28, 2003 A. Administrative 1. Title Proposal for encoding the combining diacritic arabic wasla. 2. Requester s name
LuitPad: A fully Unicode compatible Assamese writing software
LuitPad: A fully Unicode compatible Assamese writing software Navanath Saharia 1,3 Kishori M Konwar 2,3 (1) Tezpur University, Tezpur, Assam, India (2) University of British Columbia, Vancouver, Canada
Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.
Section 2.5 Cardinality (another) Definition: The cardinality of a set A is equal to the cardinality of a set B, denoted A = B, if and only if there is a bijection from A to B. If there is an injection
Pseudo code Tutorial and Exercises Teacher s Version
Pseudo code Tutorial and Exercises Teacher s Version Pseudo-code is an informal way to express the design of a computer program or an algorithm in 1.45. The aim is to get the idea quickly and also easy
Number Systems, Base Conversions, and Computer Data Representation
, Base Conversions, and Computer Data Representation Decimal and Binary Numbers When we write decimal (base 10) numbers, we use a positional notation system. Each digit is multiplied by an appropriate
Tamil Indic Input 3 User Guide
Tamil Indic Input 3 User Guide Contents 1. WHAT IS TAMIL INDIC INPUT 3?... 1 1.1. SYSTEM REQUIREMENTS... 1 1.2. APPLICATION REQUIREMENTS... 1 2. TO INSTALL TAMIL INDIC INPUT 3... 1 3. TO USE TAMIL INDIC
Excel 2007 Intermediate Documentation
Learning Outcomes Create complex formulas Utilize advanced (conditional) formatting Create and customize graphical displays Table of Contents Excel 2007 Intermediate Documentation The Center for Teaching,
MODULAR ARITHMETIC. a smallest member. It is equivalent to the Principle of Mathematical Induction.
MODULAR ARITHMETIC 1 Working With Integers The usual arithmetic operations of addition, subtraction and multiplication can be performed on integers, and the result is always another integer Division, on
Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras
Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras W3A.5 Douglas Chai and Florian Hock Visual Information Processing Research Group School of Engineering and Mathematics Edith
Hindi Indic Input 2 - User Guide
Hindi Indic Input 2 - User Guide Hindi Indic Input 2-User Guide 2 Contents WHAT IS HINDI INDIC INPUT 2?... 3 SYSTEM REQUIREMENTS... 3 TO INSTALL HINDI INDIC INPUT 2... 3 TO USE HINDI INDIC INPUT 2... 4
6 3 4 9 = 6 10 + 3 10 + 4 10 + 9 10
Lesson The Binary Number System. Why Binary? The number system that you are familiar with, that you use every day, is the decimal number system, also commonly referred to as the base- system. When you
Hindi Indic Input 3 - User Guide
Hindi Indic Input 3 - User Guide Contents 1. WHAT IS HINDI INDIC INPUT 3?... 2 1.1. SYSTEM REQUIREMENTS... 2 1.2. APPLICATION REQUIREMENTS... 2 2. TO INSTALL HINDI INDIC INPUT 3... 2 3. TO USE HINDI INDIC
The Sacred Letters of Tibet
The Sacred Letters of Tibet TIBET IS another arena of confluence of two ancient cultures. Tibet adopted government organization and social standards from China. But its spiritual guidance came from Buddhism.
Kannada Indic Input 2 - User Guide
Kannada Indic 2 - User Guide Kannada Indic 2 - User Guide 2 Contents WHAT IS KANNADA INDIC INPUT 2?... 3 SYSTEM REQUIREMENTS... 3 TO INSTALL KANNADA INDIC INPUT 2... 3 TO USE KANNADA INDIC INPUT 2... 4
Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi
Automata Theory Automata theory is the study of abstract computing devices. A. M. Turing studied an abstract machine that had all the capabilities of today s computers. Turing s goal was to describe the
Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices
Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices Tofazzal Rownok, Md. Zahurul Islam and Mumit Khan Department of Computer Science and Engineering, BRAC University, Dhaka,
Request to encode South Indian CANDRABINDU-s. 1. Background
Request to encode South Indian CANDRABINDU-s Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2010-Oct-11 This is a request to encode chandrabindu characters in the Telugu, Kannada and Malayalam blocks.
1(a). How many ways are there to rearrange the letters in the word COMPUTER?
CS 280 Solution Guide Homework 5 by Tze Kiat Tan 1(a). How many ways are there to rearrange the letters in the word COMPUTER? There are 8 distinct letters in the word COMPUTER. Therefore, the number of
EURESCOM - P923 (Babelweb) PIR.3.1
Multilingual text processing difficulties Malek Boualem, Jérôme Vinesse CNET, 1. Introduction Users of more and more applications now require multilingual text processing tools, including word processors,
DATA STRUCTURES USING C
DATA STRUCTURES USING C QUESTION BANK UNIT I 1. Define data. 2. Define Entity. 3. Define information. 4. Define Array. 5. Define data structure. 6. Give any two applications of data structures. 7. Give
Creating A Simple Dictionary With Definitions
Creating A Simple Dictionary With Definitions The KAS Knowledge Acquisition System allows you to create new dictionaries with definitions from scratch or append information to existing dictionaries. The
APPENDIX B. Routers route based on the network number. The router that delivers the data packet to the correct destination host uses the host ID.
APPENDIX B IP Subnetting IP Addressing Routers route based on the network number. The router that delivers the data packet to the correct destination host uses the host ID. IP Classes An IP address is
CCSS English/Language Arts Standards Reading: Foundational Skills Kindergarten
Reading: Foundational Skills Print Concepts CCSS.ELA-Literacy.RF.K.1 Demonstrate understanding of the organization and basic features of print. CCSS.ELA-Literacy.RF.K.1.A Follow words from left to right,
4. MATRICES Matrices
4. MATRICES 170 4. Matrices 4.1. Definitions. Definition 4.1.1. A matrix is a rectangular array of numbers. A matrix with m rows and n columns is said to have dimension m n and may be represented as follows:
Symbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
ELFRING FONTS BAR CODES EAN 8, EAN 13, & ISBN / BOOKLAND
ELFRING FONTS BAR CODES EAN 8, EAN 13, & ISBN / BOOKLAND This package includes ten EAN bar code fonts in scalable TrueType and PostScript formats, a Windows utility (BarEAN) to help you make bar codes,
Q&As: Microsoft Excel 2013: Chapter 2
Q&As: Microsoft Excel 2013: Chapter 2 In Step 5, why did the date that was entered change from 4/5/10 to 4/5/2010? When Excel recognizes that you entered a date in mm/dd/yy format, it automatically formats
Regular Expressions. Abstract
Regular Expressions Sanjiv K. Bhatia Department of Mathematics & Computer Science University of Missouri St. Louis St. Louis, MO 63121 email: sanjiv@cs.umsl.edu Abstract Regular expressions provide a powerful
COMMON CORE STATE STANDARDS FOR
COMMON CORE STATE STANDARDS FOR English Language Arts & Literacy in History/Social Studies, Science, and Technical Subjects Kindergarten Introduction to the Common Core State Standards for English Language
Phonics and Word Work
Phonics and Word Work Introduction Foundational Skills This guide explores how explicit and systematic phonics and word work instruction is included in the ReadyGEN program. It looks at the resources that
Academic Standards for Reading, Writing, Speaking, and Listening
Academic Standards for Reading, Writing, Speaking, and Listening Pre-K - 3 REVISED May 18, 2010 Pennsylvania Department of Education These standards are offered as a voluntary resource for Pennsylvania
Preservation Handbook
Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc
2. Compressing data to reduce the amount of transmitted data (e.g., to save money).
Presentation Layer The presentation layer is concerned with preserving the meaning of information sent across a network. The presentation layer may represent (encode) the data in various ways (e.g., data
Introduction to Matrix Algebra I
Appendix A Introduction to Matrix Algebra I Today we will begin the course with a discussion of matrix algebra. Why are we studying this? We will use matrix algebra to derive the linear regression model
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
So let us begin our quest to find the holy grail of real analysis.
1 Section 5.2 The Complete Ordered Field: Purpose of Section We present an axiomatic description of the real numbers as a complete ordered field. The axioms which describe the arithmetic of the real numbers
Massachusetts Tests for Educator Licensure
Massachusetts Tests for Educator Licensure FIELD 90: FOUNDATIONS OF READING TEST OBJECTIVES Subarea Multiple-Choice Range of Objectives Approximate Test Weighting I. Foundations of Reading Development
Exercises with solutions (1)
Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal
Applies to Version 6 Release 5 X12.6 Application Control Structure
Applies to Version 6 Release 5 X12.6 Application Control Structure ASC X12C/2012-xx Copyright 2012, Data Interchange Standards Association on behalf of ASC X12. Format 2012 Washington Publishing Company.
Multiple archival data streams and directory uploads: Dealing with complex multi- archival digital assets paired with simplified presentation formats
Multiple archival data streams and directory uploads: Dealing with complex multi- archival digital assets paired with simplified presentation formats Description This document describes best practices
Kazuraki : Under The Hood
Kazuraki : Under The Hood Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated Why Develop Kazuraki? To build excitement and awareness about OpenType Japanese fonts Kazuraki is the first
Thai Pronunciation and Phonetic Symbols Prawet Jantharat Ed.D.
Thai Pronunciation and Phonetic Symbols Prawet Jantharat Ed.D. This guideline contains a number of things concerning the pronunciation of Thai. Thai writing system is a non-roman alphabet system. This
Cherokee Language Technology Program - Education Services
Cherokee Language Technology Program - Education Services Cherokee Language Fonts and Keyboard Layouts Cherokee Font (Cherokee.ttf) The Cherokee language is compatible across many different platforms and
Streaming Lossless Data Compression Algorithm (SLDC)
Standard ECMA-321 June 2001 Standardizing Information and Communication Systems Streaming Lossless Data Compression Algorithm (SLDC) Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http://www.ecma.ch
KWIC Implemented with Pipe Filter Architectural Style
KWIC Implemented with Pipe Filter Architectural Style KWIC Implemented with Pipe Filter Architectural Style... 2 1 Pipe Filter Systems in General... 2 2 Architecture... 3 2.1 Pipes in KWIC system... 3
Decimal Numbers: Base 10 Integer Numbers & Arithmetic
Decimal Numbers: Base 10 Integer Numbers & Arithmetic Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Example: 3271 = (3x10 3 ) + (2x10 2 ) + (7x10 1 )+(1x10 0 ) Ward 1 Ward 2 Numbers: positional notation Number
Pythagorean Triples. Chapter 2. a 2 + b 2 = c 2
Chapter Pythagorean Triples The Pythagorean Theorem, that beloved formula of all high school geometry students, says that the sum of the squares of the sides of a right triangle equals the square of the
Indiana Department of Education
GRADE 1 READING Guiding Principle: Students read a wide range of fiction, nonfiction, classic, and contemporary works, to build an understanding of texts, of themselves, and of the cultures of the United
Database Programming with PL/SQL: Learning Objectives
Database Programming with PL/SQL: Learning Objectives This course covers PL/SQL, a procedural language extension to SQL. Through an innovative project-based approach, students learn procedural logic constructs
COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012
Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about
A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts
[Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational
International Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:
TU e. Advanced Algorithms: experimentation project. The problem: load balancing with bounded look-ahead. Input: integer m 2: number of machines
The problem: load balancing with bounded look-ahead Input: integer m 2: number of machines integer k 0: the look-ahead numbers t 1,..., t n : the job sizes Problem: assign jobs to machines machine to which
Usage Guide on Business Information Modeling (BIM) Spreadsheet (v1.1)
Usage Guide on Business Information Modeling (BIM) Spreadsheet (v1.1) Table of Contents 1. INTRODUCTION...3 1.1 ABOUT THIS DOCUMENT...3 1.2 WHAT IS BIM SPREADSHEET...3 1.3 MAJOR FEATURES OF BIM SPREADSHEET...3
Ekaterina Timofeeva TOWARDS LOCALIZATION OF ANGLICISMS: A data-driven analysis of anglicisms on the Best Western Italia website
2 HUMANISTINEN TIEDEKUNTA KIELTEN LAITOS Ekaterina Timofeeva TOWARDS LOCALIZATION OF ANGLICISMS: A data-driven analysis of anglicisms on the Best Western Italia website Pro gradu -tutkielma Englannin kieli
5.3 The Cross Product in R 3
53 The Cross Product in R 3 Definition 531 Let u = [u 1, u 2, u 3 ] and v = [v 1, v 2, v 3 ] Then the vector given by [u 2 v 3 u 3 v 2, u 3 v 1 u 1 v 3, u 1 v 2 u 2 v 1 ] is called the cross product (or
Year 1 reading expectations (New Curriculum) Year 1 writing expectations (New Curriculum)
Year 1 reading expectations Year 1 writing expectations Responds speedily with the correct sound to graphemes (letters or groups of letters) for all 40+ phonemes, including, where applicable, alternative
Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh
Bangla Localization of OpenOffice.org Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Localization L10n is the process of adapting the text and applications of a product or service to
A3-R4: PROGRAMMING AND PROBLEM SOLVING THROUGH C LANGUAGE. Write a program to find sum of all prime numbers between 100 and 500.
A3-R4: PROGRAMMING AND PROBLEM SOLVING THROUGH C LANGUAGE Assignment 1. Write a program to find sum of all prime numbers between 100 and 500. Assignment 2. Write a program to obtain sum of the first 10
INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS
INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS By Dr. Kay MacPhee President/Founder Ooka Island, Inc. 1 Integrating the Common Core Standards into Interactive, Online
Toad Data Modeler - Features Matrix
Toad Data Modeler - Features Matrix Functionality Commercial Trial Freeware Notes General Features Physical Model (database specific) Universal Model (generic physical model) Logical Model (support for
A Little Set Theory (Never Hurt Anybody)
A Little Set Theory (Never Hurt Anybody) Matthew Saltzman Department of Mathematical Sciences Clemson University Draft: August 21, 2013 1 Introduction The fundamental ideas of set theory and the algebra