Introduction to Unicode and Writing Systems

Size: px
Start display at page:

Download "Introduction to Unicode and Writing Systems"

Transcription

1 Introduction to Unicode and Writing Systems Denis Kiryaev If you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will. Joel Spolsky, Joel on Software 1

2 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 2

3 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 3

4 ASCII 4

5 Windows

6 6

7 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 7

8 Unicode as a standard Characters table Encodings Rendering rules & Algorithms 8

9 Private Use Area characters Supplementary Ideographic Plane (SIP) Supplementary Multilingual Plane (SMP) 17 planes characters each Basic Multilingual Plane (BMP) 9

10 10

11 Encodings UTF-8 UTF-16 UTF-32 Character set, codepage and encoding 11

12 UTF-8 0 Invalid bytes in UTF-8 sequence: C0, C1, F5 FF 12

13 UTF-8 example Character Binary code point Binary UTF-8 Hexadecimal UTF-8 $ U U+00A C2 A2 U+20AC E2 82 AC 𤭢 U+24B F0 A4 AD A2 13

14 UTF-16 Codepoint range U U+D7FF U+E000 - U+FFFF U U+10FFFF U+D800 - U+DFFF Encoding scheme 16-bit integer equal to codepoint number Surrogate pair: Lead surrogate = 0xD800 + ((code point - 0x10000) >> 10 Trail surrogate = 0xDC00 + (code point & 0x3FF) Reserved for UTF-16 surrogates 14

15 UTF-16 example Code point Glyph character UTF-16 code units (hex) U+007A Latin small letter z 007A z U+6C34 U U+1D11E CJK unified ideograph-6c34 (water) Linear B syllable B008 A (first non-bmp code point) Musical symbol G clef 6C34 D800 DC00 D834 DD1E U+10FFFD Private use character- 10FFFD (last Unicode code point) DBFF DFFD 15

16 Byte order mark (BOM) H e l l o Intel x C 00 6C 00 6F System/ C 00 6C 00 6F 00 Byte order mark (BOM) = U+FEFF (zero-width non-breaking space) U+FFFE reserved not a character BOM H e l l o Intel x86 FF FE C 00 6C 00 6F System/360 FE FF C 00 6C 00 6F 00 UTF-16BE and UTF-16LE IANA approved encoding names 16

17 UTF-32 The most simple fixed length The most redundant BOM is also applicable Not recommended in HTML 5 standard 17

18 Character set, codepage and encoding Pre-Unicode era Character set = codepage = charmap = encoding Unicode era Codepage = charmap = legacy encoding Defines table encoding for locale-specific Unicode subset Character set Means whole Unicode repertoire For legacy encodings means Unicode subset encoded by that encoding Encoding One of the Unicode encodings 18

19 Rendering rules & Algorithms Unlike Russian and English, most languages in the world use complex scripting Rendering complexities: Diacritics Ligatures Right-to-left and other writing directions Sorting complexities: Diacritics Ligatures Ideographs 19

20 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 20

21 Writing systems - French Père Noël 21

22 Precomposed and decomposed forms Precomposed form: P + è + r + e N + o + ë + l Decomposed form: P + e + + r + e N + o + e + + l Both forms are equivalent 22

23 Normalization forms NFD Normalization Form Canonical Decomposition NFC Normalization Form Canonical Composition NFKD Normalization Form Compatibility Decomposition NFKC Normalization Form Compatibility Composition 23

24 Compatibility ff (U+FB00) f+f (U+0066), but compatible ß (U+00DF) s+s (U+0073), but compatible Ⅻ (U+216B) compatible to Ⅹ (U+2169) + Ⅰ (U+2160) + Ⅰ 24

25 Canonical ordering e = e + (U+0301) + (U+031C) or e + (U+031C) + (U+0301) e + (U+031C, CCC=220) + (U+0301, CCC=230) e = e + (U+0303, CCC=230) + (U+0300, CCC=230) e = e + (U+0300, CCC=230) + (U+0303, CCC=230) 25

26 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 26

27

28

29 Arabic script Right-to-left Cursive script each letter has 4 glyphs: initial, medial, final, and isolated No upper or lower letters Diacritics used for short vowels 29

30 Arabic short vowels 30

31 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 31

32 32

33 Chinese character types Pictographs ( 木 tree) Ideographs ( 上 up, 日 sun) Logical aggregates ( 東 east, sun rising in the trees) Phonetic complexes ( 晴 clear weather, sun + blue) 33

34 Writing rules Each character is fit into the square; complex characters are scaled as necessary Writing direction: traditional top to bottom, right to left; modern left to right Traditional writing has little or even no punctuation while modern has enough Latin characters and numbers have 2 forms: halfwidth (1) and fullwidth (1) with different codepoints 34

35 Horisontal vs. Vertical 35

36 Traditional and Simplified Chinese Simplified Chinese is a standard in PRC, Singapore, Malaysia Traditional Chinese is used in Hong Kong, Taiwan, Macau, and overseas Chinese communities Different codepoints for the same character in traditional and simplified Chinese 36

37 GB18030 PRC national standard Treated as Unicode encoding, although not a part of the standard Is a superset of double-byte code page GB2312 which in turn is a superset of ASCII GB18030 support is required for software sold in China 37

38 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 38

39 Writing systems - Japanese Mix of 3 scripts: Kanji (Chinese characters) Hiragana (syllabary used for grammatical elements) Katakana (syllabary used for foreign words and names, etc.) Traditional writing direction same as Chinese, modern left to right Sorting is based on kana 39

40 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 40

41 Writing systems - Korean 2 scripts: Hanja Chinese characters Hangul alphabet created in 1446 ㅂㄷㅈㄱㅃㄸㅉㄲㅍㅌㅊㅋㅅㅎㅆㅁㄴㅇㄹㅣㅔㅚㅐㅏㅗㅜㅓㅡㅢㅖㅒㅑㅛㅠㅕㅟㅞㅙㅘㅝ 41

42 Korean hangul syllables ㅂㄷㅈㄱㅃㄸㅉㄲㅍㅌㅊㅋㅅㅎㅆㅁㄴㅇㄹ ㅣㅔㅚㅐㅏㅗㅜㅓㅡㅢㅖㅒㅑㅛㅠㅕㅟㅞㅙㅘㅝ 다국어 42

43 Han unification One codepoint for the same ideograph in Chinese, Japanese, Korean Simplified Chinese is not unified Font is different for each of the languages resulting in localized rendering characters assigned Many disadvantages of unification; proposals exist for de-unification (e.g. ISO/IEC 2022) 43

44 Ideographs sorting in Han Radical ideograph component 214 radicals defined in Unicode now 44

45 Input Method Editor 4 main methods: Typing latin translit of the character Typing local translit (kana Japan, pinyin China) Drawing shape with mouse/pen, IME recognizes character 45

46 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 46

47 Hindi combining characters 47

48 Ligatures in Hindi 48

49 More info Internationalization & Unicode conference (Santa Clara, CA) Wikipedia (English version only!) has good simple articles about Unicode 49

50 THANK YOU Denis Kiryaev, EMC 50

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode Unicode Why Unicode? What is Unicode? Unicode Architecture Why Unicode? Pre-Unicode Standards

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

San José, February 16, 2001

San José, February 16, 2001 San José, February 16, 2001 Feel free to distribute this text (version 1.4) including the author s e-mail address (mailto:dmeyer@adobe.com) and to contact him for corrections and additions. Please do not

More information

EURESCOM - P923 (Babelweb) PIR.3.1

EURESCOM - P923 (Babelweb) PIR.3.1 Multilingual text processing difficulties Malek Boualem, Jérôme Vinesse CNET, 1. Introduction Users of more and more applications now require multilingual text processing tools, including word processors,

More information

.ASIA CJK (Chinese Japanese Korean) IDN Policies

.ASIA CJK (Chinese Japanese Korean) IDN Policies Date: Status: Version: 1.1.ASIA IDN Policies 04-May-2011 COMPLETE Archive URL: References: http://dot.asia/policies/dotasia-cjk-idn-policies-complete--2011-05-04.pdf.asia ZH / JA / KO IDN Language Tables

More information

Unraveling Unicode: A Bag of Tricks for Bug Hunting

Unraveling Unicode: A Bag of Tricks for Bug Hunting Unraveling Unicode: A Bag of Tricks for Bug Hunting Black Hat USA July 2009 Chris Weber www.lookout.net chris@casabasecurity.com Casaba Security Can you tell the difference? How about now? The Transformers

More information

Report on Chinese Variants in Internationalized Top-Level Domains

Report on Chinese Variants in Internationalized Top-Level Domains Report on Chinese Variants in Internationalized Top-Level Domains This report considers the issues relating to the Chinese (Han) script variants being represented as multiple different labels in the Domain

More information

Encoding script-specific writing rules based on the Unicode character set

Encoding script-specific writing rules based on the Unicode character set Encoding script-specific writing rules based on the Unicode character set Malek Boualem, Mark Leisher, Bill Ogden Computing Research Laboratory (CRL), New Mexico State University, Box 30001, Dept 3CRL,

More information

Red Hat Enterprise Linux International Language Support Guide

Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Copyright This book is about international language support for Red Hat Enterprise

More information

Data Integrator. Encoding Reference. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA

Data Integrator. Encoding Reference. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Data Integrator Encoding Reference Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Telephone: 888.296.5969 or 512.231.6000 Fax: 512.231.6010 Email: info@pervasiveintegration.com

More information

Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC www.casabasecurity.com

Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC www.casabasecurity.com Unicode Security Software Vulnerability Testing Guide (DRAFT DOCUMENT this document is currently a preview in DRAFT form. Please contact me with corrections or feedback.) Software Globalization provides

More information

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows IDN TECHNICAL SPECIFICATION February 3rd, 2012 1 IDN technical specifications - Version 1.0 - February 3rd, 2012 IDN TECHNICAL SPECIFICATION February 3rd, 2012 2 Table of content 1. Foreword...3 1.1. Reference

More information

Japanese Character Printers EPL2 Programming Manual Addendum

Japanese Character Printers EPL2 Programming Manual Addendum Japanese Character Printers EPL2 Programming Manual Addendum This addendum contains information unique to Zebra Technologies Japanese character bar code printers. The Japanese configuration printers support

More information

HKSCS-2004 Support for Windows Platform

HKSCS-2004 Support for Windows Platform HKSCS-2004 Support for Windows Platform Windows XP Font Pack for ISO 10646:2003 + Amendment 1 Traditional Chinese Support (HKSCS-2004) update for Windows XP and Windows Server 2003 June 2010 Version 1.0

More information

WORKING DRAFT. ISO/IEC International Standard International Standard 10646. ISO/IEC 10646 1 st Edition + Amd1

WORKING DRAFT. ISO/IEC International Standard International Standard 10646. ISO/IEC 10646 1 st Edition + Amd1 ISO/IEC JC1/SC2/WG2 N2937 ISO/IEC International Standard International Standard 10646 ISO/IEC 10646 1 st Edition + Amd1 Information technology Universal Multiple-Octet Coded Character Set (UCS) Architecture

More information

Chapter 4: Computer Codes

Chapter 4: Computer Codes Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data

More information

Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt

Rendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt Rendering/Layout Engine for Complex script Pema Geyleg pgeyleg@dit.gov.bt Overview What is the Layout Engine/ Rendering? What is complex text? Types of rendering engine? How does it work? How does it support

More information

Frequently Asked Questions on character sets and languages in MT and MX free format fields

Frequently Asked Questions on character sets and languages in MT and MX free format fields Frequently Asked Questions on character sets and languages in MT and MX free format fields Version Final 17 January 2008 Preface The Frequently Asked Questions (FAQs) on character sets and languages that

More information

Four ACEs. A Survey of ASCII Compatible Encodings. International Unicode Conference 22 September 2002

Four ACEs. A Survey of ASCII Compatible Encodings. International Unicode Conference 22 September 2002 Four ACEs A Survey of ASCII Compatible Encodings International Unicode Conference 22 September 2002 by Addison P. Phillips Director, Globalization Architecture c TABLE OF CONTENTS INTRODUCTION... 3 WHAT'S

More information

Kazuraki : Under The Hood

Kazuraki : Under The Hood Kazuraki : Under The Hood Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated Why Develop Kazuraki? To build excitement and awareness about OpenType Japanese fonts Kazuraki is the first

More information

Pemrograman Dasar. Basic Elements Of Java

Pemrograman Dasar. Basic Elements Of Java Pemrograman Dasar Basic Elements Of Java Compiling and Running a Java Application 2 Portable Java Application 3 Java Platform Platform: hardware or software environment in which a program runs. Oracle

More information

Keyboards for inputting Japanese language -A study based on US patents

Keyboards for inputting Japanese language -A study based on US patents Keyboards for inputting Japanese language -A study based on US patents Umakant Mishra Bangalore, India umakant@trizsite.tk http://umakant.trizsite.tk (This paper was published in April 2005 issue of TRIZsite

More information

Binary Representation

Binary Representation Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must tbe able to handle more than just values for real world problems

More information

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02 Printing Unicode characters from SAP to SATO GT4xxe Printers User Guide Version 061030-02 2006 SATO Corporation. All rights reserved. Table of Contents 1. Introduction... 3 2. Configuration at SAP environment...

More information

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease

More information

encoding compression encryption

encoding compression encryption encoding compression encryption ASCII utf-8 utf-16 zip mpeg jpeg AES RSA diffie-hellman Expressing characters... ASCII and Unicode, conventions of how characters are expressed in bits. ASCII (7 bits) -

More information

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal. Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must be able to handle more than just values for real world problems

More information

Authority file comparison rules Introduction

Authority file comparison rules Introduction Authority file comparison rules Revised 2007-01-11; further revised 2009-04-01 (noted with red text) Note: This report was accepted and approved by the PCC Policy Committee in November 2007. Introduction

More information

HP Business Notebook Password Localization Guidelines V1.0

HP Business Notebook Password Localization Guidelines V1.0 HP Business Notebook Password Localization Guidelines V1.0 November 2009 Table of Contents: 1. Introduction..2 2. Supported Platforms...2 3. Overview of Design...3 4. Supported Keyboard Layouts in Preboot

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

ISO/IEC JTC1 SC2/WG2 N4399

ISO/IEC JTC1 SC2/WG2 N4399 ISO/IEC JTC1 SC2/WG2 N4399 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de rmalisation Международная организация по стандартизации

More information

L2/14-009 Abstract Introduction

L2/14-009 Abstract Introduction P P T 0 1 S P P P P P P S P P P P P 0 S 1 1 S 0 0 1 P 0 S 1 T P 0 S 1 T 1 T P 0 S 1 T P 0 T P P P 0 1 S S 1 0 T P S P 1 0 T S P 0 1 P 0 S 1 T TPPT Form for PT ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY

More information

Multi-lingual Label Printing with Unicode

Multi-lingual Label Printing with Unicode Multi-lingual Label Printing with Unicode White Paper Version 20100716 2009 SATO CORPORATION. All rights reserved. http://www.satoworldwide.com softwaresupport@satogbs.com 2009 SATO Corporation. All rights

More information

Session ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems

Session ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems Session ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems Dr. Christian Hansen, SAP AG Agenda Introduction About Code Pages Communication: The Ideal Picture Communication:

More information

Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding

Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding Published by National Electrical Manufacturers Association 1300 N. 17th Street Rosslyn, Virginia 22209 USA Copyright

More information

Unicode Enabling Java Web Applications

Unicode Enabling Java Web Applications Internationalization Report: Unicode Enabling Java Web Applications From Browser to DB Provided by: LingoPort, Inc. 1734 Sumac Avenue Boulder, Colorado 80304 Tel: +1.303.444.8020 Fax: +1.303.484.2447 http://www.lingoport.com

More information

Internationalizing JavaScript Applications Norbert Lindenberg. Norbert Lindenberg 2013. All rights reserved.

Internationalizing JavaScript Applications Norbert Lindenberg. Norbert Lindenberg 2013. All rights reserved. Internationalizing JavaScript Applications Norbert Lindenberg Norbert Lindenberg 2013. All rights reserved. Agenda Unicode support Collation Number and date/time formatting Localizable resources Message

More information

Counting in base 10, 2 and 16

Counting in base 10, 2 and 16 Counting in base 10, 2 and 16 1. Binary Numbers A super-important fact: (Nearly all) Computers store all information in the form of binary numbers. Numbers, characters, images, music files --- all of these

More information

Introduction to Internationalized Domain Names (IDN)

Introduction to Internationalized Domain Names (IDN) Introduction to ized Domain Names (IDN) IP Symposium for CEE, CIS and Baltic States Moscow, Russia 16-19 September 2003 Robert Shaw ITU Internet Strategy and Policy Advisor Agenda

More information

Critical Values for I18n Testing. Tex Texin Chief Globalization Architect XenCraft

Critical Values for I18n Testing. Tex Texin Chief Globalization Architect XenCraft Critical Values for I18n Testing Tex Texin Chief Globalization Architect XenCraft Abstract In this session, we recommend specific data values that are likely to identify internationalization problems in

More information

Designing Global Applications: Requirements and Challenges

Designing Global Applications: Requirements and Challenges Designing Global Applications: Requirements and Challenges Sourav Mazumder Abstract This paper explores various business drivers for globalization and examines the nature of globalization requirements

More information

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 8.0 Core Specification The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

IDN: Challenges and Opportunities A registry s view of the multilingual web. Rome, March 2013!

IDN: Challenges and Opportunities A registry s view of the multilingual web. Rome, March 2013! IDN: Challenges and Opportunities A registry s view of the multilingual web " Rome, March 2013! Everything is about the end user! 2! Name! Deng Fu Xiang"! Occupation! Freelance photographer" " Age! 35

More information

HOW TO USE A KANJI DICTIONARY

HOW TO USE A KANJI DICTIONARY HOW TO USE A KANJI DICTIONARY As I write this, in 2008, I fear that the skills I m about to describe will soon be who am I kidding? already are endangered species. That is to say, the way to use a traditional

More information

Points to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese

Points to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese General Format, Manner and Procedure for the Submission of Electronic Information under Law by virtue of the Electronic Transactions Ordinance (Chapter 553) Points to Note (This Note aims to set out the

More information

Commission on E-Business, IT and Telecoms Task Force on the Internet and IT Services

Commission on E-Business, IT and Telecoms Task Force on the Internet and IT Services International Chamber of Commerce The world business organization Department of Policy and Business Practices Commission on E-Business, IT and Telecoms Task Force on the Internet and IT Services Issues

More information

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives 146 CHAPTER 3 Information Representation The sign bit is 1, so the number is negative. Converting to decimal gives 37A (hex) = 134 (dec) Notice that the hexadecimal number is not written with a negative

More information

INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP

INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP Dr. William A. Newman, Texas A&M International University, wnewman@tamiu.edu Mr. Syed S. Ghaznavi, Texas A&M

More information

Internationalized Domain Names -

Internationalized Domain Names - Internationalized Domain Names - Getting them to work Gihan Dias LK Domain Registry What is IDN? Originally DNS names were restricted to the characters a-z (letters), 0-9 (digits) and '-' (hyphen) (LDH)

More information

Email Content Control. Admin Guide

Email Content Control. Admin Guide Email Content Control Admin Guide Document Revision Date: May 7, 2013 Email Content Control Admin Guide i Contents Introduction... 1 About Content Control... 1 Configuration Overview for Content Control...

More information

Unicode in Mobile Phones

Unicode in Mobile Phones Unicode in Mobile Phones Adil Allawi Technical Director adil@diwan.com Unicode in Mobile Phones Complex text in small devices Memory is very limited Processing speed is low yet time is critical Screen

More information

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot ASCII Code Data coding Morse code was the first code used for long-distance communication. Samuel F.B. Morse invented it in 1844. This code is made up of dots and dashes (a sort of binary code). It was

More information

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July 2013. The JSON Data Interchange Format. Reference number ECMA-123:2009

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July 2013. The JSON Data Interchange Format. Reference number ECMA-123:2009 Ecma/TC39/2013/NN 4 th Draft ECMA-XXX 1 st Edition / July 2013 The JSON Data Interchange Format Reference number ECMA-123:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2013

More information

Internet Engineering Task Force (IETF) Request for Comments: 7790 Category: Informational. February 2016

Internet Engineering Task Force (IETF) Request for Comments: 7790 Category: Informational. February 2016 Internet Engineering Task Force (IETF) Request for Comments: 7790 Category: Informational ISSN: 2070-1721 Y. Yoneya JPRS T. Nemoto Keio University February 2016 Mapping Characters for Classes of the Preparation,

More information

Anything goes (well, almost...)

Anything goes (well, almost...) Wilfred van Rooijen VOORJAAR 2008 3 Anything goes (well, almost...) Abstract This paper tries to illustrate some of the particularities of typesetting CJK characters using several flavors of LaTX. Special

More information

How to be a CSI (encoding Crime Scene Investigator)

How to be a CSI (encoding Crime Scene Investigator) Objectives for Crime Scene Investigation How to be a CSI (encoding Crime Scene Investigator) ex exin Internationalization Architect Yahoo Inc. Have some fun Prevent death by bullet points Introduce strategies

More information

Top 10 Tips for Chinese Website Design

Top 10 Tips for Chinese Website Design Automotive Health and Safety Public Sector Documentation Websites Technical Top 10 Tips for Chinese Website Design A short guide by Damian Scattergood STAR Technology Solutions, Docklands Innovation Park,

More information

UNDERSTANDING SMS: Practitioner s Basics

UNDERSTANDING SMS: Practitioner s Basics UNDERSTANDING SMS: Practitioner s Basics Michael Harrington, CFCE, EnCE It is estimated that in 2006, 72% of all mobile phone users world wide were active users of SMS or text messaging. In European countries

More information

Very often my clients ask me, Don I need Chinese translation. If I ask which Chinese? They just say Just Chinese. If I explain them there re more

Very often my clients ask me, Don I need Chinese translation. If I ask which Chinese? They just say Just Chinese. If I explain them there re more Hello, fellow colleagues in Translation industry. And, Thank you very much for nice introduction. Vanessa. When you hear the topic Asian Languages and Markets, each of you probably had some questions or

More information

Encoding Text with a Small Alphabet

Encoding Text with a Small Alphabet Chapter 2 Encoding Text with a Small Alphabet Given the nature of the Internet, we can break the process of understanding how information is transmitted into two components. First, we have to figure out

More information

Intro to Ancient Writing Systems

Intro to Ancient Writing Systems Cave painting from Lascaux, c. 15,000 10,000 B.C. Random placement and shifting scale signify prehistoric people s lack of structure and sequence in recording their experiences. Beautifully drawn but not

More information

Whisky Emoji Submission

Whisky Emoji Submission Whisky Emoji Submission To: Unicode Technical Committee Date: 19th vember 2014 From: Tom Hoad Work Club 1. Introduction Following the 7.0 Unicode release, a community of whisky fans on Facebook and Twitter

More information

How to represent characters?

How to represent characters? Copyright Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information. How to represent characters?

More information

The use of binary codes to represent characters

The use of binary codes to represent characters The use of binary codes to represent characters Teacher s Notes Lesson Plan x Length 60 mins Specification Link 2.1.4/hi Character Learning objective (a) Explain the use of binary codes to represent characters

More information

Cherokee Language Technology Program - Education Services

Cherokee Language Technology Program - Education Services Cherokee Language Technology Program - Education Services Cherokee Language Fonts and Keyboard Layouts Cherokee Font (Cherokee.ttf) The Cherokee language is compatible across many different platforms and

More information

Memory is implemented as an array of electronic switches

Memory is implemented as an array of electronic switches Memory Structure Memory is implemented as an array of electronic switches Each switch can be in one of two states 0 or 1, on or off, true or false, purple or gold, sitting or standing BInary digits (bits)

More information

HP Service Manager Compatibility Matrix

HP Service Manager Compatibility Matrix HP Service Manager Compatibility Matrix Software Version 9.21 January 12, 2011 Click one of the following links to see more detailed information. Tier Definitions Servers Applications Support Windows Client

More information

Preservation Handbook

Preservation Handbook Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc

More information

PROMOTION OF THE ARABIC DOMAIN NAME SYSTEM

PROMOTION OF THE ARABIC DOMAIN NAME SYSTEM Distr. LIMITED E/ESCWA/ICTD/2009/Technical Paper.1 31 December 2009 ORIGINAL: ENGLISH ECONOMIC AND SOCIAL COMMISSION FOR WESTERN ASIA (ESCWA) PROMOTION OF THE ARABIC DOMAIN NAME SYSTEM Request for Comments

More information

Right-to-Left Language Support in EMu

Right-to-Left Language Support in EMu EMu Documentation Right-to-Left Language Support in EMu Document Version 1.1 EMu Version 4.0 www.kesoftware.com 2010 KE Software. All rights reserved. Contents SECTION 1 Overview 1 SECTION 2 Switching

More information

Cyber Security Workshop Encryption Reference Manual

Cyber Security Workshop Encryption Reference Manual Cyber Security Workshop Encryption Reference Manual May 2015 Basic Concepts in Encoding and Encryption Binary Encoding Examples Encryption Cipher Examples 1 P a g e Encoding Concepts Binary Encoding Basics

More information

Section 1.4 Place Value Systems of Numeration in Other Bases

Section 1.4 Place Value Systems of Numeration in Other Bases Section.4 Place Value Systems of Numeration in Other Bases Other Bases The Hindu-Arabic system that is used in most of the world today is a positional value system with a base of ten. The simplest reason

More information

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business The future of International SEO The future of Search Engine Optimization (SEO) for International Business Whitepaper The World Wide Web is now allowing special characters in URLs which means crawlers now

More information

Table Of Contents. iii

Table Of Contents. iii PASSOLO Handbook Table Of Contents General... 1 Content Overview... 1 Typographic Conventions... 2 First Steps... 3 First steps... 3 The Welcome dialog... 3 User login... 4 PASSOLO Projects... 5 Overview...

More information

XML Character Encoding and Decoding

XML Character Encoding and Decoding XML Character Encoding and Decoding January 2013 Table of Contents 1. Excellent quotes 2. Lots of character conversions taking place inside our computers and on the Web 3. Well-formedness error when encoding="..."

More information

Windows NLS Considerations

Windows NLS Considerations Radoslav Rusinov Radoslav.Rusinov.remove._.spam.@gmail.com Contents 1. Introduction... 3 1.1. Windows and Code Pages... 3 1.2. CharacterSet... 3 1.3. Encoding Scheme... 3 1.4. Fonts... 4 1.5. So Why Are

More information

Request for Comments: 4627 Category: Informational July 2006. The application/json Media Type for JavaScript Object Notation (JSON)

Request for Comments: 4627 Category: Informational July 2006. The application/json Media Type for JavaScript Object Notation (JSON) Network Working Group D. Crockford Request for Comments: 4627 JSON.org Category: Informational July 2006 The application/json Media Type for JavaScript Object Notation (JSON) Status of This Memo This memo

More information

Software localization to China and Chinese

Software localization to China and Chinese Export HIS project Localization Tekes grant nr 70062/04 Writer Hellevi Ruonamaa (hellevi.ruonamaa@uku.fi) Document status Finalized Date 8.1.2006 Software localization to China and Chinese Introduction...

More information

1. Basic encoding principles

1. Basic encoding principles 1 of 5 5/2/2006 11:41 AM ISO/IEC JTC1/SC2/WG2 N1636 DATE: 1997-08-25 DOC TYPE: Expert contribution TITLE: Encoding Egyptian Hieroglyphs in ISO/IEC 10646-2 SOURCE: Michael Everson PROJECT: JTC1.02.18.02

More information

Multilingual Ediscovery: Options, Obstacles and Opportunities Report

Multilingual Ediscovery: Options, Obstacles and Opportunities Report Multilingual Ediscovery: Options, Obstacles and Opportunities Report A guide to collecting, filtering, reviewing and producing multilingual documents in discovery. An Altegrity Company Copyright 2014 Kroll

More information

How to translate your website. An overview of the steps to take if you are about to embark on a website localization project.

How to translate your website. An overview of the steps to take if you are about to embark on a website localization project. How to translate your website An overview of the steps to take if you are about to embark on a website localization project. Getting Started Translating websites can be an expensive and complex process.

More information

Unicode Support in Enterprise COBOL. Nick Tindall Stephen Miller Sam Horiguchi August 13, 2003

Unicode Support in Enterprise COBOL. Nick Tindall Stephen Miller Sam Horiguchi August 13, 2003 Unicode Support in Enterprise COBOL Nick Tindall Stephen Miller Sam Horiguchi August 13, 2003 What is Unicode?! Industry standard for coded character set - defined by Unicode Consortium and ISO! Covers

More information

Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines

Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines Cary Jensen, Jensen Data Systems, Inc. December 2009 (updated May 2011) Americas Headquarters EMEA Headquarters Asia-Pacific

More information

Webinar: Software & Mobile App Localization

Webinar: Software & Mobile App Localization Webinar: Software & Mobile App Localization Presented by: Jon Ritzdorf March 22, 2012 s2 Webinar Tips Minimize control panel To ask questions use Questions area Slide 2 s2 Only use this slide if we're

More information

Internationalization & Localization

Internationalization & Localization Internationalization & Localization Of OpenOffice.org - The Indian Perspective Comprehensive Office Suite for Multilingual Indic Computing Bhupesh Koli, Shikha G Pillai

More information

The Virtual Tibetan Classroom

The Virtual Tibetan Classroom The Virtual Tibetan Classroom by William Magee, DDBC Thanks to a Generous Grant from the Taiwan National Science Council and the Hopkins MultimediaTibetan Research Archive Project http://haa.ddbc.edu.tw

More information

The Chinese Language and Language Planning in China. By Na Liu, Center for Applied Linguistics

The Chinese Language and Language Planning in China. By Na Liu, Center for Applied Linguistics The Chinese Language and Language Planning in China By Na Liu, Center for Applied Linguistics This brief introduces the Chinese language and its varieties and describes Chinese language planning initiatives

More information

Cisco Unified Attendant Console Advanced Version 10.0

Cisco Unified Attendant Console Advanced Version 10.0 Data Sheet Cisco Unified Attendant Console Advanced Version 10.0 Product Overview Manage high volumes of calls from customers, employees, and business partners smoothly and efficiently. Cisco Unified Attendant

More information

Internationalization of Domain Names: A history of technology development

Internationalization of Domain Names: A history of technology development Internationalization of Domain Names: A history of technology development John C Klensin and Patrik Fältström First-generation Hostnames and Character Coding Consideration of internationalization issues

More information

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Bangla Localization of OpenOffice.org Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Localization L10n is the process of adapting the text and applications of a product or service to

More information

I. FOR STUDENTS WHO WANT TO CONTINUE A FOREIGN LANGUAGE:

I. FOR STUDENTS WHO WANT TO CONTINUE A FOREIGN LANGUAGE: R e c o m m e n d e d C o u r s e s f o r T H H S B r i d g e Y e a r S t u d e n t s The following is a list of Fall 2016 Queens College courses which are recommended for Townsend Harris seniors. For

More information

Foreign Languages FOREIGN LANGUAGES. 2016-17 Sacramento City College Catalog. Degree: AA-T Spanish for Transfer

Foreign Languages FOREIGN LANGUAGES. 2016-17 Sacramento City College Catalog. Degree: AA-T Spanish for Transfer Foreign Languages Degree: AA-T Spanish for Transfer Division of Humanities and Fine Arts Chris Iwata, Dean Performing Arts Center 137 916-558-2551 Arabic ARABIC Chinese Cantonese-CANT French FREN Greek

More information

Localization of Text Editor using Java Programming

Localization of Text Editor using Java Programming Localization of Text Editor using Java Programming Varsha Tomar M.Tech Scholar Banasthali University Jaipur, India Manisha Bhatia Assistant Professor Banasthali University Jaipur, India ABSTRACT Software

More information

Internationalization of the Domain Name System: The Next Big Step in a Multilingual Internet

Internationalization of the Domain Name System: The Next Big Step in a Multilingual Internet Internationalization of the Domain Name System: The Next Big Step in a Multilingual Internet Tan Tin Wee 1, James Seng 2, and S.Maniam 2 1 National University of Singapore, Singapore 119260 2 i-dns.net

More information

SMPP protocol analysis using Wireshark (SMS)

SMPP protocol analysis using Wireshark (SMS) SMPP protocol analysis using Wireshark (SMS) Document Purpose Help analyzing SMPP traffic using Wireshark. Give hints about common caveats and oddities of the SMPP protocol and its implementations. Most

More information

QAD Business Intelligence Release Notes

QAD Business Intelligence Release Notes QAD Business Intelligence Release Notes September 2008 These release notes include information about the latest QAD Business Intelligence (QAD BI) fixes and changes. These changes may affect the way you

More information

BAR CODE 39 ELFRING FONTS INC.

BAR CODE 39 ELFRING FONTS INC. ELFRING FONTS INC. BAR CODE 39 This package includes 18 versions of a bar code 39 font in scalable TrueType and PostScript formats, a Windows utility, Bar39.exe, that helps you make bar codes, and Visual

More information

Analyzing Unicode Text with Regular Expressions

Analyzing Unicode Text with Regular Expressions Analyzing Unicode Text with Regular Expressions Andy Heninger IBM Corporation heninger@us.ibm.com Abstract For decades now, Regular Expressions have been used in the analysis of text data, for searching

More information