Unicode Demystified. A Practical Programmer s Guide to the Encoding Standard. by Richard Gillam

Size: px
Start display at page:

Download "Unicode Demystified. A Practical Programmer s Guide to the Encoding Standard. by Richard Gillam"

Transcription

1 Unicode Demystified A Practical Programmer s Guide to the Encoding Standard by Richard Gillam

2 Copyright by Richard T. Gillam. All rights reserved. Pre-publication draft number Tuesday, January 15, 2002

3 To Mark, Kathleen, Laura, Helena, Doug, John F, John R, Markus, Bertrand, Alan, Eric, and the rest of the old JTCSV Unicode crew, without whom this book would not have been possible and To Ted and Joyce Gillam, without whom the author would not have been possible

4

5 Table of Contents Table of Contents v Preface xv About this book xvi How this book was produced xviii The author s journey xviii Acknowledgements xix A personal appeal xx Unicode in Essence An Architectural Overview of the Unicode Standard 1 CHAPTER 1 Language, Computers, and Unicode 3 What Unicode Is 6 What Unicode Isn t 8 The challenge of representing text in computers 10 What This Book Does 14 How this book is organized 15 Section I: Unicode in Essence 15 Section II: Unicode in Depth 16 Section III: Unicode in Action 17 CHAPTER 2 A Brief History of Character Encoding 19 Prehistory 19 The telegraph and Morse code 20 v

6 Table of Contents The teletypewriter and Baudot code 21 Other teletype and telegraphy codes 22 FIELDATA and ASCII 23 Hollerith and EBCDIC 24 Single-byte encoding systems 26 Eight-bit encoding schemes and the ISO 2022 model 27 ISO Other 8-bit encoding schemes 29 Character encoding terminology 30 Multiple-byte encoding systems 32 East Asian coded character sets 32 Character encoding schemes for East Asian coded character sets 33 Other East Asian encoding systems 36 ISO and Unicode 36 How the Unicode standard is maintained 41 CHAPTER 3 Architecture: Not Just a Pile of Code Charts 43 The Unicode Character-Glyph Model 44 Character positioning 47 The Principle of Unification 50 Alternate-glyph selection 53 Multiple Representations 54 Flavors of Unicode 56 Character Semantics 58 Unicode Versions and Unicode Technical Reports 60 Unicode Standard Annexes 60 Unicode Technical Standards 61 Unicode Technical Reports 61 Draft and Proposed Draft Technical Reports 61 Superseded Technical Reports 62 Unicode Versions 62 Unicode stability policies 63 Arrangement of the encoding space 64 Organization of the planes 64 The Basic Multilingual Plane 66 The Supplementary Planes 69 Non-Character code point values 72 Conforming to the standard 73 General 74 Producing text as output 75 Interpreting text from the outside world 75 Passing text through 76 Drawing text on the screen or other output devices 76 Comparing character strings 77 Summary 77 CHAPTER 4 Combining character sequences and Unicode normalization 79 How Unicode non-spacing marks work 81 vi Unicode Demystified

7 Dealing properly with combining character sequences 83 Canonical decompositions 84 Canonical accent ordering 85 Double diacritics 87 Compatibility decompositions 88 Singleton decompositions 90 Hangul 91 Unicode normalization forms 93 Grapheme clusters 94 CHAPTER 5 Character Properties and the Unicode Character Database 99 Where to get the Unicode Character Database 99 The UNIDATA directory 100 UnicodeData.txt 103 PropList.txt 105 General character properties 107 Standard character names 107 Algorithmically-derived names 108 Control-character names 109 ISO comment 109 Block and Script 110 General Category 110 Letters 110 Marks 112 Numbers 112 Punctuation 113 Symbols 114 Separators 114 Miscellaneous 114 Other categories 115 Properties of letters 117 SpecialCasing.txt 117 CaseFolding.txt 119 Properties of digits, numerals, and mathematical symbols 119 Layout-related properties 120 Bidirectional layout 120 Mirroring 121 Atabic contextual shaping 122 East Asian width 122 Line breaking property 123 Normalization-related properties 124 Decomposition 124 Decomposition type 124 Combining class 126 Composition exclusion list 127 Normalization test file 127 Derived normalization properties 128 Grapheme-cluster-related properties 128 Unihan.txt 129 A Practical Programmer s Guide to the Encoding Standard vii

8 Table of Contents CHAPTER 6 Unicode Storage and Serialization Formats 131 A historical note 132 UTF UTF-16 and the surrogate mechanism 134 Endian-ness and the Byte Order Mark 136 UTF CESU UTF-EBCDIC 141 UTF Standard Compression Scheme for Unicode 143 BOCU 146 Detecting Unicode storage formats 147 Unicode in Depth A Guided Tour of the Character Repertoire 149 CHAPTER 7 Scripts of Europe 151 The Western alphabetic scripts 151 The Latin alphabet 153 The Latin-1 characters 155 The Latin Extended A block 155 The Latin Extended B block 157 The Latin Extended Additional block 158 The International Phonetic Alphabet 159 Diacritical marks 160 Isolated combining marks 164 Spacing modifier letters 165 The Greek alphabet 166 The Greek block 168 The Greek Extended block 169 The Coptic alphabet 169 The Cyrillic alphabet 170 The Cyrillic block 173 The Cyrillic Supplementary block 173 The Armenian alphabet 174 The Georgian alphabet 175 CHAPTER 8 Scripts of The Middle East 177 Bidirectional Text Layout 178 The Unicode Bidirectional Layout Algorithm 181 Inherent directionality 181 Neutrals 184 Numbers 185 The Left-to-Right and Right-to-Left Marks 186 The Explicit Embedding Characters 187 viii Unicode Demystified

9 Mirroring characters 188 Line and Paragraph Boundaries 188 Bidirectional Text in a Text-Editing Environment 189 The Hebrew Alphabet 192 The Hebrew block 194 The Arabic Alphabet 194 The Arabic block 199 Joiners and non-joiners 199 The Arabic Presentation Forms B block 201 The Arabic Presentation Forms A block 202 The Syriac Alphabet 202 The Syriac block 204 The Thaana Script 205 The Thaana block 207 CHAPTER 9 Scripts of India and Southeast Asia 209 Devanagari 212 The Devanagari block 217 Bengali 221 The Bengali block 223 Gurmukhi 223 The Gurmukhi block 225 Gujarati 225 The Gujarati block 226 Oriya 226 The Oriya block 227 Tamil 227 The Tamil block 230 Telugu 230 The Telugu block 232 Kannada 232 The Kannada block 233 Malayalam 234 The Malayalam block 235 Sinhala 235 The Sinhala block 236 Thai 237 The Thai block 238 Lao 239 The Lao block 240 Khmer 241 The Khmer block 243 Myanmar 243 The Myanmar block 244 Tibetan 245 The Tibetan block 247 The Philippine Scripts 247 CHAPTER 10 Scripts of East Asia 251 The Han characters 252 A Practical Programmer s Guide to the Encoding Standard ix

10 Table of Contents Variant forms of Han characters 261 Han characters in Unicode 263 The CJK Unified Ideographs area 267 The CJK Unified Ideographs Extension A area 267 The CJK Unified Ideographs Extension B area 267 The CJK Compatibility Ideographs block 268 The CJK Compatibility Ideographs Supplement block 268 The Kangxi Radicals block 268 The CJK Radicals Supplement block 269 Indeographic description sequences 269 Bopomofo 274 The Bopomofo block 275 The Bopomofo Extended block 275 Japanese 275 The Hiragana block 281 The Katakana block 281 The Katakana Phonetic Extensions block 281 The Kanbun block 281 Korean 282 The Hangul Jamo block 284 The Hangul Compatibility Jamo block 285 The Hangul Syllables area 285 Halfwidth and fullwidth characters 286 The Halfwidth and Fullwidth Forms block 288 Vertical text layout 288 Ruby 292 The Interlinear Annotation characters 293 Yi 294 The Yi Syllables block 295 The Yi Radicals block 295 CHAPTER 11 Scripts from Other Parts of the World 297 Mongolian 298 The Mongolian block 300 Ethiopic 301 The Ethiopic block 303 Cherokee 303 The Cherokee block 304 Canadian Aboriginal Syllables 304 The Unified Canadian Aboriginal Syllabics block 305 Historical scripts 305 Runic 306 Ogham 307 Old Italic 307 Gothic 308 Deseret 309 CHAPTER 12 Numbers, Punctuation, Symbols, and Specials 311 Numbers 311 x Unicode Demystified

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode Unicode Why Unicode? What is Unicode? Unicode Architecture Why Unicode? Pre-Unicode Standards

More information

DRH specification framework

DRH specification framework DRH specification framework 2007-03-15 EDM - NIED Takeshi KAWAMOTO, Hiroaki NEGISHI, Mitsuaki SASAKI 1 DRH Basic Development before Sep. 2007 Server architectures Search architectures Multilanguage Architectures

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Red Hat Enterprise Linux International Language Support Guide

Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Copyright This book is about international language support for Red Hat Enterprise

More information

Inventory of Romanization Tools

Inventory of Romanization Tools Inventory of Romanization Tools Standards Intellectual Management Office Library and Archives Canad Ottawa 2006 Inventory of Romanization Tools page 1 Amharic Ethiopic BGN/PCGN 1967 Arabic Arabic ISO 233:1984.Transliteration

More information

EURESCOM - P923 (Babelweb) PIR.3.1

EURESCOM - P923 (Babelweb) PIR.3.1 Multilingual text processing difficulties Malek Boualem, Jérôme Vinesse CNET, 1. Introduction Users of more and more applications now require multilingual text processing tools, including word processors,

More information

Introduction to Internationalized Domain Names (IDN)

Introduction to Internationalized Domain Names (IDN) Introduction to ized Domain Names (IDN) IP Symposium for CEE, CIS and Baltic States Moscow, Russia 16-19 September 2003 Robert Shaw ITU Internet Strategy and Policy Advisor Agenda

More information

Chapter 4: Computer Codes

Chapter 4: Computer Codes Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data

More information

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot ASCII Code Data coding Morse code was the first code used for long-distance communication. Samuel F.B. Morse invented it in 1844. This code is made up of dots and dashes (a sort of binary code). It was

More information

Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt

Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt Rendering/Layout Engine for Complex script Pema Geyleg pgeyleg@dit.gov.bt Overview What is the Layout Engine/ Rendering? What is complex text? Types of rendering engine? How does it work? How does it support

More information

Frequently Asked Questions on character sets and languages in MT and MX free format fields

Frequently Asked Questions on character sets and languages in MT and MX free format fields Frequently Asked Questions on character sets and languages in MT and MX free format fields Version Final 17 January 2008 Preface The Frequently Asked Questions (FAQs) on character sets and languages that

More information

Data Integrator. Encoding Reference. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA

Data Integrator. Encoding Reference. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Data Integrator Encoding Reference Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Telephone: 888.296.5969 or 512.231.6000 Fax: 512.231.6010 Email: info@pervasiveintegration.com

More information

WORKING DRAFT. ISO/IEC International Standard International Standard 10646. ISO/IEC 10646 1 st Edition + Amd1

WORKING DRAFT. ISO/IEC International Standard International Standard 10646. ISO/IEC 10646 1 st Edition + Amd1 ISO/IEC JC1/SC2/WG2 N2937 ISO/IEC International Standard International Standard 10646 ISO/IEC 10646 1 st Edition + Amd1 Information technology Universal Multiple-Octet Coded Character Set (UCS) Architecture

More information

Right-to-Left Language Support in EMu

Right-to-Left Language Support in EMu EMu Documentation Right-to-Left Language Support in EMu Document Version 1.1 EMu Version 4.0 www.kesoftware.com 2010 KE Software. All rights reserved. Contents SECTION 1 Overview 1 SECTION 2 Switching

More information

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease

More information

www.cle.org.pk PROFESSOR AND HEAD DR. SARMAD HUSSAIN Al- Khwarizmi Institute of Computer Sciences University of Engineering and Technology, Lahore

www.cle.org.pk PROFESSOR AND HEAD DR. SARMAD HUSSAIN Al- Khwarizmi Institute of Computer Sciences University of Engineering and Technology, Lahore Internationalized Domain Names (IDNs) www.cle.org.pk DR. SARMAD HUSSAIN PROFESSOR AND HEAD Al- Khwarizmi Institute of Computer Sciences University of Engineering and Technology, Lahore sarmad.hussain@kics.edu.pk

More information

.ASIA CJK (Chinese Japanese Korean) IDN Policies

.ASIA CJK (Chinese Japanese Korean) IDN Policies Date: Status: Version: 1.1.ASIA IDN Policies 04-May-2011 COMPLETE Archive URL: References: http://dot.asia/policies/dotasia-cjk-idn-policies-complete--2011-05-04.pdf.asia ZH / JA / KO IDN Language Tables

More information

National Language (Tamil) Support in Oracle An Oracle White paper / November 2004

National Language (Tamil) Support in Oracle An Oracle White paper / November 2004 National Language (Tamil) Support in Oracle An Oracle White paper / November 2004 Vasundhara V* & Nagarajan M & * vasundhara.venkatasubramanian@oracle.com; & Nagarajan.muthukrishnan@oracle.com) Oracle

More information

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Bangla Localization of OpenOffice.org Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Localization L10n is the process of adapting the text and applications of a product or service to

More information

New International features of Internet Explorer

New International features of Internet Explorer New International features of Internet Explorer Michel Suignard Microsoft Corporation 1 Summary This document presents new implementations of international features by Microsoft Internet Explorer version

More information

IDN: Challenges and Opportunities A registry s view of the multilingual web. Rome, March 2013!

IDN: Challenges and Opportunities A registry s view of the multilingual web. Rome, March 2013! IDN: Challenges and Opportunities A registry s view of the multilingual web " Rome, March 2013! Everything is about the end user! 2! Name! Deng Fu Xiang"! Occupation! Freelance photographer" " Age! 35

More information

Preservation Handbook

Preservation Handbook Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc

More information

PRICE LIST. ALPHA TRANSLATION AGENCY www.biuro-tlumaczen.tv info@biuro-tlumaczen.tv

PRICE LIST. ALPHA TRANSLATION AGENCY www.biuro-tlumaczen.tv info@biuro-tlumaczen.tv We encourage you to get to know the prices of the services provided by Alpha Translation Agency in the range of standard and certified written translations of common and rare languages, as well as interpretation

More information

Analyzing Unicode Text with Regular Expressions

Analyzing Unicode Text with Regular Expressions Analyzing Unicode Text with Regular Expressions Andy Heninger IBM Corporation heninger@us.ibm.com Abstract For decades now, Regular Expressions have been used in the analysis of text data, for searching

More information

Multi-lingual Label Printing with Unicode

Multi-lingual Label Printing with Unicode Multi-lingual Label Printing with Unicode White Paper Version 20100716 2009 SATO CORPORATION. All rights reserved. http://www.satoworldwide.com softwaresupport@satogbs.com 2009 SATO Corporation. All rights

More information

Designing Global Applications: Requirements and Challenges

Designing Global Applications: Requirements and Challenges Designing Global Applications: Requirements and Challenges Sourav Mazumder Abstract This paper explores various business drivers for globalization and examines the nature of globalization requirements

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Internationalization & Localization

Internationalization & Localization Internationalization & Localization Of OpenOffice.org - The Indian Perspective Comprehensive Office Suite for Multilingual Indic Computing Bhupesh Koli, Shikha G Pillai

More information

San José, February 16, 2001

San José, February 16, 2001 San José, February 16, 2001 Feel free to distribute this text (version 1.4) including the author s e-mail address (mailto:dmeyer@adobe.com) and to contact him for corrections and additions. Please do not

More information

Speaking your language...

Speaking your language... 1 About us: Cuttingedge Translation Services Pvt. Ltd. (Cuttingedge) has its corporate headquarters in Noida, India and an office in Glasgow, UK. Over the time we have serviced clients from various backgrounds

More information

Internationalized Domain Names -

Internationalized Domain Names - Internationalized Domain Names - Getting them to work Gihan Dias LK Domain Registry What is IDN? Originally DNS names were restricted to the characters a-z (letters), 0-9 (digits) and '-' (hyphen) (LDH)

More information

HKSCS-2004 Support for Windows Platform

HKSCS-2004 Support for Windows Platform HKSCS-2004 Support for Windows Platform Windows XP Font Pack for ISO 10646:2003 + Amendment 1 Traditional Chinese Support (HKSCS-2004) update for Windows XP and Windows Server 2003 June 2010 Version 1.0

More information

INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP

INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP Dr. William A. Newman, Texas A&M International University, wnewman@tamiu.edu Mr. Syed S. Ghaznavi, Texas A&M

More information

Binary Representation

Binary Representation Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must tbe able to handle more than just values for real world problems

More information

How to represent characters?

How to represent characters? Copyright Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information. How to represent characters?

More information

Character Code Structure and Extension Techniques

Character Code Structure and Extension Techniques Standard ECMA-35 6th Edition - December 1994 Standardizing Information and Communication Systems Character Code Structure and Extension Techniques Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - X.400:

More information

L2/14-009 Abstract Introduction

L2/14-009 Abstract Introduction P P T 0 1 S P P P P P P S P P P P P 0 S 1 1 S 0 0 1 P 0 S 1 T P 0 S 1 T 1 T P 0 S 1 T P 0 T P P P 0 1 S S 1 0 T P S P 1 0 T S P 0 1 P 0 S 1 T TPPT Form for PT ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY

More information

Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding

Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding Published by National Electrical Manufacturers Association 1300 N. 17th Street Rosslyn, Virginia 22209 USA Copyright

More information

Developing international webapplications. Frode Eika Sandnes Faculty of Engineering, Oslo University College. internationalisation 18 letters.

Developing international webapplications. Frode Eika Sandnes Faculty of Engineering, Oslo University College. internationalisation 18 letters. Developing international webapplications Frode Eika Sandnes Faculty of Engineering, Oslo University College internationalisation 18 letters i18n 1 Internationalisation vs localisation Internationalisation

More information

Email Content Control. Admin Guide

Email Content Control. Admin Guide Email Content Control Admin Guide Document Revision Date: May 7, 2013 Email Content Control Admin Guide i Contents Introduction... 1 About Content Control... 1 Configuration Overview for Content Control...

More information

Kazuraki : Under The Hood

Kazuraki : Under The Hood Kazuraki : Under The Hood Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated Why Develop Kazuraki? To build excitement and awareness about OpenType Japanese fonts Kazuraki is the first

More information

Encoding script-specific writing rules based on the Unicode character set

Encoding script-specific writing rules based on the Unicode character set Encoding script-specific writing rules based on the Unicode character set Malek Boualem, Mark Leisher, Bill Ogden Computing Research Laboratory (CRL), New Mexico State University, Box 30001, Dept 3CRL,

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Multilingual Ediscovery: Options, Obstacles and Opportunities Report

Multilingual Ediscovery: Options, Obstacles and Opportunities Report Multilingual Ediscovery: Options, Obstacles and Opportunities Report A guide to collecting, filtering, reviewing and producing multilingual documents in discovery. An Altegrity Company Copyright 2014 Kroll

More information

Japanese Character Printers EPL2 Programming Manual Addendum

Japanese Character Printers EPL2 Programming Manual Addendum Japanese Character Printers EPL2 Programming Manual Addendum This addendum contains information unique to Zebra Technologies Japanese character bar code printers. The Japanese configuration printers support

More information

coral SOFTWARE LOCALISATION LANGUAGE SERVICES WEBSITE TRANSLATION MEDICAL TRANSLATION MULTILINGUAL DTP TRANSCRIPTION VOICEOVER & SUBTITLING

coral SOFTWARE LOCALISATION LANGUAGE SERVICES WEBSITE TRANSLATION MEDICAL TRANSLATION MULTILINGUAL DTP TRANSCRIPTION VOICEOVER & SUBTITLING SOFTWARE LOCALISATION LANGUAGE SERVICES // TRANSCRIPTION MULTILINGUAL DTP MEDICAL TRANSLATION WEBSITE TRANSLATION VOICEOVER & SUBTITLING INTERPRETER SERVICES elearning TRANSLATION about us Coral Knowledge

More information

Internationalization of Domain Names: A history of technology development

Internationalization of Domain Names: A history of technology development Internationalization of Domain Names: A history of technology development John C Klensin and Patrik Fältström First-generation Hostnames and Character Coding Consideration of internationalization issues

More information

Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC www.casabasecurity.com

Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC www.casabasecurity.com Unicode Security Software Vulnerability Testing Guide (DRAFT DOCUMENT this document is currently a preview in DRAFT form. Please contact me with corrections or feedback.) Software Globalization provides

More information

Table 1: TSQM Version 1.4 Available Translations

Table 1: TSQM Version 1.4 Available Translations Quintiles, Inc. 1 Tables 1, 2, & 3 below list the existing and available translations for the TSQM v1.4, TSQM vii, TSQM v9. If Quintiles does not have a translation that your Company needs, the Company

More information

Encoding Text with a Small Alphabet

Encoding Text with a Small Alphabet Chapter 2 Encoding Text with a Small Alphabet Given the nature of the Internet, we can break the process of understanding how information is transmitted into two components. First, we have to figure out

More information

Four ACEs. A Survey of ASCII Compatible Encodings. International Unicode Conference 22 September 2002

Four ACEs. A Survey of ASCII Compatible Encodings. International Unicode Conference 22 September 2002 Four ACEs A Survey of ASCII Compatible Encodings International Unicode Conference 22 September 2002 by Addison P. Phillips Director, Globalization Architecture c TABLE OF CONTENTS INTRODUCTION... 3 WHAT'S

More information

LOCALIZATION PROCESS CHECKLIST

LOCALIZATION PROCESS CHECKLIST LOCALIZATION PROCESS CHECKLIST THE TRANSLATION COMPANY LOCALIZATION CHECKLIST This checklist should be completed for all new projects involving localization. A proper planning of the requirements upfront

More information

TRIDINDIA IT TRANSLATION SERVICES PRIVATE LIMITED

TRIDINDIA IT TRANSLATION SERVICES PRIVATE LIMITED TRIDINDIA IT TRANSLATION SERVICES PRIVATE LIMITED As we understand your business is mostly about words, we not only translate words, we transform business in the world of words. Established in 2002 with

More information

PHOTOSTORE 3 SERIES MANUAL TABLE OF CONTENTS

PHOTOSTORE 3 SERIES MANUAL TABLE OF CONTENTS PHOTOSTORE 3 SERIES MANUAL Manual Version 3.9.1 TABLE OF CONTENTS PHOTOSTORE 3 SERIES MANUAL TABLE OF CONTENTS INSTALLATION, SUPPORT, AND UPGRADES SECURITY USING THE STORE MANAGER HOME SETTINGS Backup

More information

Points to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese

Points to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese General Format, Manner and Procedure for the Submission of Electronic Information under Law by virtue of the Electronic Transactions Ordinance (Chapter 553) Points to Note (This Note aims to set out the

More information

Summary Table of Contents

Summary Table of Contents Summary Table of Contents Preface VII For whom is this book intended? What is its topical scope? Summary of its organization. Suggestions how to read it. Part I: Why We Need Long-term Digital Preservation

More information

Unraveling Unicode: A Bag of Tricks for Bug Hunting

Unraveling Unicode: A Bag of Tricks for Bug Hunting Unraveling Unicode: A Bag of Tricks for Bug Hunting Black Hat USA July 2009 Chris Weber www.lookout.net chris@casabasecurity.com Casaba Security Can you tell the difference? How about now? The Transformers

More information

ISO/IEC JTC1 SC2/WG2 N4399

ISO/IEC JTC1 SC2/WG2 N4399 ISO/IEC JTC1 SC2/WG2 N4399 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de rmalisation Международная организация по стандартизации

More information

Freescale Embedded GUI Converter Utility 2.0 Quick User Guide

Freescale Embedded GUI Converter Utility 2.0 Quick User Guide Freescale Semiconductor User Guide Document Number: EGUICUG Rev. 1, 08/2010 Freescale Embedded GUI Converter Utility 2.0 Quick User Guide 1 Introduction The Freescale Embedded GUI Converter Utility 2.0

More information

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02 Printing Unicode characters from SAP to SATO GT4xxe Printers User Guide Version 061030-02 2006 SATO Corporation. All rights reserved. Table of Contents 1. Introduction... 3 2. Configuration at SAP environment...

More information

Report on Data from the 2004 05 MLA Guide to Doctoral Programs in English and Other Modern Languages

Report on Data from the 2004 05 MLA Guide to Doctoral Programs in English and Other Modern Languages Prepublication Release: The final version of this report will appear in the ADE Bulletin No. 140, Fall 2006. Report on Data from the 2004 05 MLA Guide to Doctoral Programs in and Other Modern Languages

More information

Centricity Enterprise Web 3.0 DICOM Conformance Memo DOC0094970

Centricity Enterprise Web 3.0 DICOM Conformance Memo DOC0094970 DOC0094970 CONTENTS 1 Introduction... 3 1.1 Scope and Purpose... 3 1.2 Intended Audience... 3 1.3 Scope and Field of Application... 3 1.4 References... 4 1.5 Definitions... 4 1.6 Symbols and Abbreviations...

More information

Session ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems

Session ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems Session ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems Dr. Christian Hansen, SAP AG Agenda Introduction About Code Pages Communication: The Ideal Picture Communication:

More information

SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE

SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE 595 SETTING UP A MULTILINGUAL INFORMATION REPOSITORY : A CASE STUDY WITH EPRINTS.ORG SOFTWARE Nagaraj N Vaidya Francis Jayakanth Abstract Today 80 % of the content on the Web is in English, which is spoken

More information

UNIVERSITY OF MYSORE B Com. ( ANNUAL ) DEGREE EXAMINATIONS - MAY / JUNE 2014 TIME TABLE

UNIVERSITY OF MYSORE B Com. ( ANNUAL ) DEGREE EXAMINATIONS - MAY / JUNE 2014 TIME TABLE 02/06/2014 11002 ENGLISH 31201 BUSINESS LEGISLATION MONDAY (Common to 99 Sch. & equivalent paper to Business Laws of 93 Sch.) 03/06/2014 31102 ENGLISH 31202 BUSINESS STATISTICS TUESDAY 04/06/2014 11013

More information

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal. Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must be able to handle more than just values for real world problems

More information

The use of binary codes to represent characters

The use of binary codes to represent characters The use of binary codes to represent characters Teacher s Notes Lesson Plan x Length 60 mins Specification Link 2.1.4/hi Character Learning objective (a) Explain the use of binary codes to represent characters

More information

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business The future of International SEO The future of Search Engine Optimization (SEO) for International Business Whitepaper The World Wide Web is now allowing special characters in URLs which means crawlers now

More information

The Unicode Consortium ADDISON WESLEY. An Imprint of Addison Wesley Longman, Inc.

The Unicode Consortium ADDISON WESLEY. An Imprint of Addison Wesley Longman, Inc. The Unicode Standard Version 3.0 The Unicode Consortium ADDISON WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts Harlow, England Menlo Park, California Berkeley, California Don

More information

encoding compression encryption

encoding compression encryption encoding compression encryption ASCII utf-8 utf-16 zip mpeg jpeg AES RSA diffie-hellman Expressing characters... ASCII and Unicode, conventions of how characters are expressed in bits. ASCII (7 bits) -

More information

Keywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3

Keywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3 Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 4 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Keyboards for inputting Japanese language -A study based on US patents

Keyboards for inputting Japanese language -A study based on US patents Keyboards for inputting Japanese language -A study based on US patents Umakant Mishra Bangalore, India umakant@trizsite.tk http://umakant.trizsite.tk (This paper was published in April 2005 issue of TRIZsite

More information

The Virtual Tibetan Classroom

The Virtual Tibetan Classroom The Virtual Tibetan Classroom by William Magee, DDBC Thanks to a Generous Grant from the Taiwan National Science Council and the Hopkins MultimediaTibetan Research Archive Project http://haa.ddbc.edu.tw

More information

The Indian National Bibliography: Today and tomorrow

The Indian National Bibliography: Today and tomorrow Submitted on: June 22, 2013 The Indian National Bibliography: Today and tomorrow Shahina P. Ahas Central Reference Library, Kolkata, India E-mail : shahinaprashob@gmail.com Swapna Banerjee Department of

More information

HP Business Notebook Password Localization Guidelines V1.0

HP Business Notebook Password Localization Guidelines V1.0 HP Business Notebook Password Localization Guidelines V1.0 November 2009 Table of Contents: 1. Introduction..2 2. Supported Platforms...2 3. Overview of Design...3 4. Supported Keyboard Layouts in Preboot

More information

Unicode Enabling Java Web Applications

Unicode Enabling Java Web Applications Internationalization Report: Unicode Enabling Java Web Applications From Browser to DB Provided by: LingoPort, Inc. 1734 Sumac Avenue Boulder, Colorado 80304 Tel: +1.303.444.8020 Fax: +1.303.484.2447 http://www.lingoport.com

More information

FOREIGN LANGUAGE AND AREA STUDIES (FLAS) FELLOWSHIP For Graduate Students Academic Year 2016 2017

FOREIGN LANGUAGE AND AREA STUDIES (FLAS) FELLOWSHIP For Graduate Students Academic Year 2016 2017 FOREIGN LANGUAGE AND AREA STUDIES (FLAS) FELLOWSHIP For Graduate Students Academic Year 2016 2017 Program: Foreign Language and Area Studies (FLAS) Fellowships provide funding to students to encourage

More information

Report on Chinese Variants in Internationalized Top-Level Domains

Report on Chinese Variants in Internationalized Top-Level Domains Report on Chinese Variants in Internationalized Top-Level Domains This report considers the issues relating to the Chinese (Han) script variants being represented as multiple different labels in the Domain

More information

Private Use Area 0E000 0E00F

Private Use Area 0E000 0E00F Quivira 4.1 Private Use Area The Private Use Area consists of 6,400 Codepoints which will never be assigned to any characters in the Unicode Standard. They are meant to be used for own characters in individual

More information

Oracle Watchlist Screening

Oracle Watchlist Screening 1 Oracle Watchlist Screening Mike Matthews 3 rd party logo 2 Topics Screening trends & needs Increasing screening data accuracy Reducing false positives Screening international data

More information

Proposed Update Unicode Technical Standard #39

Proposed Update Unicode Technical Standard #39 Technical Reports Proposed Update Unicode Technical Standard #39 Version 9.0.0 (draft 3) Editors Mark Davis (markdavis@google.com), Michel Suignard (michel@suignard.com) Date 2016-04-07 This Version Previous

More information

Tel: +971 4 266 3517 Fax: +971 4 268 9615 P.O. Box: 22392, Dubai - UAE info@communicationdubai.com comm123@emirates.net.ae www.communicationdubai.

Tel: +971 4 266 3517 Fax: +971 4 268 9615 P.O. Box: 22392, Dubai - UAE info@communicationdubai.com comm123@emirates.net.ae www.communicationdubai. Tel: +971 4 266 3517 Fax: +971 4 268 9615 P.O. Box: 22392, Dubai - UAE info@communicationdubai.com comm123@emirates.net.ae www.communicationdubai.com ALL ABOUT TRANSLATION Arabic English Online Human Translation

More information

Localization of Text Editor using Java Programming

Localization of Text Editor using Java Programming Localization of Text Editor using Java Programming Varsha Tomar M.Tech Scholar Banasthali University Jaipur, India Manisha Bhatia Assistant Professor Banasthali University Jaipur, India ABSTRACT Software

More information

I. FOR STUDENTS WHO WANT TO CONTINUE A FOREIGN LANGUAGE:

I. FOR STUDENTS WHO WANT TO CONTINUE A FOREIGN LANGUAGE: R e c o m m e n d e d C o u r s e s f o r T H H S B r i d g e Y e a r S t u d e n t s The following is a list of Fall 2016 Queens College courses which are recommended for Townsend Harris seniors. For

More information

Chapter 2 Text Processing with the Command Line Interface

Chapter 2 Text Processing with the Command Line Interface Chapter 2 Text Processing with the Command Line Interface Abstract This chapter aims to help demystify the command line interface that is commonly used in UNIX and UNIX-like systems such as Linux and Mac

More information

Your single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8

Your single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8 Transit NXT Evolution from Service Pack 0 to Service Pack 8 April 2009: Transit NXT Service Pack 0 (Version 4.0.0.671) Additional versions of DTP programs supported: InDesign CS3 and FrameMaker 9 Additional

More information

Survey of University of Michigan Graduate-level Area Studies Alumni/ae & FLAS Recipients from 1996-2006: Selected Findings

Survey of University of Michigan Graduate-level Area Studies Alumni/ae & FLAS Recipients from 1996-2006: Selected Findings Survey of University of Michigan Graduate-level Area Studies Alumni/ae & FLAS Recipients from 1996-2006: Selected Findings Azumi Ann Takata, Center for Japanese Studies, International Institute Donna Parmelee,

More information

Who We Are. Services We Offer

Who We Are. Services We Offer Who We Are Atkins Translation Services is a professional language agency providing cost effective and rapid language services. Our network of over 70 native language professionals ensures we are able to

More information

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows

More information

Product Internationalization of a Document Management System

Product Internationalization of a Document Management System Case Study Product Internationalization of a ì THE CUSTOMER A US-based provider of proprietary Legal s and Archiving solutions, with a customizable document management framework. The customer s DMS was

More information

1. Basic encoding principles

1. Basic encoding principles 1 of 5 5/2/2006 11:41 AM ISO/IEC JTC1/SC2/WG2 N1636 DATE: 1997-08-25 DOC TYPE: Expert contribution TITLE: Encoding Egyptian Hieroglyphs in ISO/IEC 10646-2 SOURCE: Michael Everson PROJECT: JTC1.02.18.02

More information

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives 146 CHAPTER 3 Information Representation The sign bit is 1, so the number is negative. Converting to decimal gives 37A (hex) = 134 (dec) Notice that the hexadecimal number is not written with a negative

More information

One Report, Many Languages: Using SAS Visual Analytics to Localize Your Reports

One Report, Many Languages: Using SAS Visual Analytics to Localize Your Reports Technical Paper One Report, Many Languages: Using SAS Visual Analytics to Localize Your Reports Will Ballard and Elizabeth Bales One Report, Many Languages: Using SAS Visual Analytics to Localize Your

More information

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i.

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i. New York, NY, USA: Basic Books, 2013. p i. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=2 New York, NY, USA: Basic Books, 2013. p ii. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=3 New

More information

Approaches to Arabic Name Transliteration and Matching in the DataFlux Quality Knowledge Base

Approaches to Arabic Name Transliteration and Matching in the DataFlux Quality Knowledge Base 32 Approaches to Arabic Name Transliteration and Matching in the DataFlux Quality Knowledge Base Brant N. Kay Brian C. Rineer SAS Institute Inc. SAS Institute Inc. 100 SAS Campus Drive 100 SAS Campus Drive

More information

This is a preview - click here to buy the full publication INTERNATIONAL STANDARD

This is a preview - click here to buy the full publication INTERNATIONAL STANDARD INTERNATIONAL STANDARD lso/iec 500 First edition 996-l -0 Information technology - Adaptive Lossless Data Compression algorithm (ALDC) Technologies de I informa tjon - Algorithme de compression de don&es

More information

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows IDN TECHNICAL SPECIFICATION February 3rd, 2012 1 IDN technical specifications - Version 1.0 - February 3rd, 2012 IDN TECHNICAL SPECIFICATION February 3rd, 2012 2 Table of content 1. Foreword...3 1.1. Reference

More information

EMC SourceOne. Products Compatibility Guide 300-008-041 REV 54

EMC SourceOne. Products Compatibility Guide 300-008-041 REV 54 EMC SourceOne Products Compatibility Guide 300-008-041 REV 54 Copyright 2005-2016 EMC Corporation. All rights reserved. Published in the USA. Published February 23, 2016 EMC believes the information in

More information