Introduction to Unicode and Writing Systems
|
|
- Erick Norris
- 7 years ago
- Views:
Transcription
1 Introduction to Unicode and Writing Systems Denis Kiryaev If you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will. Joel Spolsky, Joel on Software 1
2 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 2
3 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 3
4 ASCII 4
5 Windows
6 6
7 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 7
8 Unicode as a standard Characters table Encodings Rendering rules & Algorithms 8
9 Private Use Area characters Supplementary Ideographic Plane (SIP) Supplementary Multilingual Plane (SMP) 17 planes characters each Basic Multilingual Plane (BMP) 9
10 10
11 Encodings UTF-8 UTF-16 UTF-32 Character set, codepage and encoding 11
12 UTF-8 0 Invalid bytes in UTF-8 sequence: C0, C1, F5 FF 12
13 UTF-8 example Character Binary code point Binary UTF-8 Hexadecimal UTF-8 $ U U+00A C2 A2 U+20AC E2 82 AC 𤭢 U+24B F0 A4 AD A2 13
14 UTF-16 Codepoint range U U+D7FF U+E000 - U+FFFF U U+10FFFF U+D800 - U+DFFF Encoding scheme 16-bit integer equal to codepoint number Surrogate pair: Lead surrogate = 0xD800 + ((code point - 0x10000) >> 10 Trail surrogate = 0xDC00 + (code point & 0x3FF) Reserved for UTF-16 surrogates 14
15 UTF-16 example Code point Glyph character UTF-16 code units (hex) U+007A Latin small letter z 007A z U+6C34 U U+1D11E CJK unified ideograph-6c34 (water) Linear B syllable B008 A (first non-bmp code point) Musical symbol G clef 6C34 D800 DC00 D834 DD1E U+10FFFD Private use character- 10FFFD (last Unicode code point) DBFF DFFD 15
16 Byte order mark (BOM) H e l l o Intel x C 00 6C 00 6F System/ C 00 6C 00 6F 00 Byte order mark (BOM) = U+FEFF (zero-width non-breaking space) U+FFFE reserved not a character BOM H e l l o Intel x86 FF FE C 00 6C 00 6F System/360 FE FF C 00 6C 00 6F 00 UTF-16BE and UTF-16LE IANA approved encoding names 16
17 UTF-32 The most simple fixed length The most redundant BOM is also applicable Not recommended in HTML 5 standard 17
18 Character set, codepage and encoding Pre-Unicode era Character set = codepage = charmap = encoding Unicode era Codepage = charmap = legacy encoding Defines table encoding for locale-specific Unicode subset Character set Means whole Unicode repertoire For legacy encodings means Unicode subset encoded by that encoding Encoding One of the Unicode encodings 18
19 Rendering rules & Algorithms Unlike Russian and English, most languages in the world use complex scripting Rendering complexities: Diacritics Ligatures Right-to-left and other writing directions Sorting complexities: Diacritics Ligatures Ideographs 19
20 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 20
21 Writing systems - French Père Noël 21
22 Precomposed and decomposed forms Precomposed form: P + è + r + e N + o + ë + l Decomposed form: P + e + + r + e N + o + e + + l Both forms are equivalent 22
23 Normalization forms NFD Normalization Form Canonical Decomposition NFC Normalization Form Canonical Composition NFKD Normalization Form Compatibility Decomposition NFKC Normalization Form Compatibility Composition 23
24 Compatibility ff (U+FB00) f+f (U+0066), but compatible ß (U+00DF) s+s (U+0073), but compatible Ⅻ (U+216B) compatible to Ⅹ (U+2169) + Ⅰ (U+2160) + Ⅰ 24
25 Canonical ordering e = e + (U+0301) + (U+031C) or e + (U+031C) + (U+0301) e + (U+031C, CCC=220) + (U+0301, CCC=230) e = e + (U+0303, CCC=230) + (U+0300, CCC=230) e = e + (U+0300, CCC=230) + (U+0303, CCC=230) 25
26 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 26
27
28
29 Arabic script Right-to-left Cursive script each letter has 4 glyphs: initial, medial, final, and isolated No upper or lower letters Diacritics used for short vowels 29
30 Arabic short vowels 30
31 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 31
32 32
33 Chinese character types Pictographs ( 木 tree) Ideographs ( 上 up, 日 sun) Logical aggregates ( 東 east, sun rising in the trees) Phonetic complexes ( 晴 clear weather, sun + blue) 33
34 Writing rules Each character is fit into the square; complex characters are scaled as necessary Writing direction: traditional top to bottom, right to left; modern left to right Traditional writing has little or even no punctuation while modern has enough Latin characters and numbers have 2 forms: halfwidth (1) and fullwidth (1) with different codepoints 34
35 Horisontal vs. Vertical 35
36 Traditional and Simplified Chinese Simplified Chinese is a standard in PRC, Singapore, Malaysia Traditional Chinese is used in Hong Kong, Taiwan, Macau, and overseas Chinese communities Different codepoints for the same character in traditional and simplified Chinese 36
37 GB18030 PRC national standard Treated as Unicode encoding, although not a part of the standard Is a superset of double-byte code page GB2312 which in turn is a superset of ASCII GB18030 support is required for software sold in China 37
38 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 38
39 Writing systems - Japanese Mix of 3 scripts: Kanji (Chinese characters) Hiragana (syllabary used for grammatical elements) Katakana (syllabary used for foreign words and names, etc.) Traditional writing direction same as Chinese, modern left to right Sorting is based on kana 39
40 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 40
41 Writing systems - Korean 2 scripts: Hanja Chinese characters Hangul alphabet created in 1446 ㅂㄷㅈㄱㅃㄸㅉㄲㅍㅌㅊㅋㅅㅎㅆㅁㄴㅇㄹㅣㅔㅚㅐㅏㅗㅜㅓㅡㅢㅖㅒㅑㅛㅠㅕㅟㅞㅙㅘㅝ 41
42 Korean hangul syllables ㅂㄷㅈㄱㅃㄸㅉㄲㅍㅌㅊㅋㅅㅎㅆㅁㄴㅇㄹ ㅣㅔㅚㅐㅏㅗㅜㅓㅡㅢㅖㅒㅑㅛㅠㅕㅟㅞㅙㅘㅝ 다국어 42
43 Han unification One codepoint for the same ideograph in Chinese, Japanese, Korean Simplified Chinese is not unified Font is different for each of the languages resulting in localized rendering characters assigned Many disadvantages of unification; proposals exist for de-unification (e.g. ISO/IEC 2022) 43
44 Ideographs sorting in Han Radical ideograph component 214 radicals defined in Unicode now 44
45 Input Method Editor 4 main methods: Typing latin translit of the character Typing local translit (kana Japan, pinyin China) Drawing shape with mouse/pen, IME recognizes character 45
46 Agenda Introduction Unicode as a standard Writing Systems, or the fun begins French Arabic Chinese Japanese Korean Hindi 46
47 Hindi combining characters 47
48 Ligatures in Hindi 48
49 More info Internationalization & Unicode conference (Santa Clara, CA) Wikipedia (English version only!) has good simple articles about Unicode 49
50 THANK YOU Denis Kiryaev, EMC 50
Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing
Introduction to Unicode By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode Unicode Why Unicode? What is Unicode? Unicode Architecture Why Unicode? Pre-Unicode Standards
More informationThe Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationThe Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationSan José, February 16, 2001
San José, February 16, 2001 Feel free to distribute this text (version 1.4) including the author s e-mail address (mailto:dmeyer@adobe.com) and to contact him for corrections and additions. Please do not
More informationEURESCOM - P923 (Babelweb) PIR.3.1
Multilingual text processing difficulties Malek Boualem, Jérôme Vinesse CNET, 1. Introduction Users of more and more applications now require multilingual text processing tools, including word processors,
More information.ASIA CJK (Chinese Japanese Korean) IDN Policies
Date: Status: Version: 1.1.ASIA IDN Policies 04-May-2011 COMPLETE Archive URL: References: http://dot.asia/policies/dotasia-cjk-idn-policies-complete--2011-05-04.pdf.asia ZH / JA / KO IDN Language Tables
More informationUnraveling Unicode: A Bag of Tricks for Bug Hunting
Unraveling Unicode: A Bag of Tricks for Bug Hunting Black Hat USA July 2009 Chris Weber www.lookout.net chris@casabasecurity.com Casaba Security Can you tell the difference? How about now? The Transformers
More informationReport on Chinese Variants in Internationalized Top-Level Domains
Report on Chinese Variants in Internationalized Top-Level Domains This report considers the issues relating to the Chinese (Han) script variants being represented as multiple different labels in the Domain
More informationEncoding script-specific writing rules based on the Unicode character set
Encoding script-specific writing rules based on the Unicode character set Malek Boualem, Mark Leisher, Bill Ogden Computing Research Laboratory (CRL), New Mexico State University, Box 30001, Dept 3CRL,
More informationRed Hat Enterprise Linux International Language Support Guide
Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Copyright This book is about international language support for Red Hat Enterprise
More informationData Integrator. Encoding Reference. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA
Data Integrator Encoding Reference Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Telephone: 888.296.5969 or 512.231.6000 Fax: 512.231.6010 Email: info@pervasiveintegration.com
More informationUnicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC www.casabasecurity.com
Unicode Security Software Vulnerability Testing Guide (DRAFT DOCUMENT this document is currently a preview in DRAFT form. Please contact me with corrections or feedback.) Software Globalization provides
More informationHow To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows
IDN TECHNICAL SPECIFICATION February 3rd, 2012 1 IDN technical specifications - Version 1.0 - February 3rd, 2012 IDN TECHNICAL SPECIFICATION February 3rd, 2012 2 Table of content 1. Foreword...3 1.1. Reference
More informationJapanese Character Printers EPL2 Programming Manual Addendum
Japanese Character Printers EPL2 Programming Manual Addendum This addendum contains information unique to Zebra Technologies Japanese character bar code printers. The Japanese configuration printers support
More informationHKSCS-2004 Support for Windows Platform
HKSCS-2004 Support for Windows Platform Windows XP Font Pack for ISO 10646:2003 + Amendment 1 Traditional Chinese Support (HKSCS-2004) update for Windows XP and Windows Server 2003 June 2010 Version 1.0
More informationWORKING DRAFT. ISO/IEC International Standard International Standard 10646. ISO/IEC 10646 1 st Edition + Amd1
ISO/IEC JC1/SC2/WG2 N2937 ISO/IEC International Standard International Standard 10646 ISO/IEC 10646 1 st Edition + Amd1 Information technology Universal Multiple-Octet Coded Character Set (UCS) Architecture
More informationChapter 4: Computer Codes
Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data
More informationRendering/Layout Engine for Complex script. Pema Geyleg pgeyleg@dit.gov.bt
Rendering/Layout Engine for Complex script Pema Geyleg pgeyleg@dit.gov.bt Overview What is the Layout Engine/ Rendering? What is complex text? Types of rendering engine? How does it work? How does it support
More informationFrequently Asked Questions on character sets and languages in MT and MX free format fields
Frequently Asked Questions on character sets and languages in MT and MX free format fields Version Final 17 January 2008 Preface The Frequently Asked Questions (FAQs) on character sets and languages that
More informationFour ACEs. A Survey of ASCII Compatible Encodings. International Unicode Conference 22 September 2002
Four ACEs A Survey of ASCII Compatible Encodings International Unicode Conference 22 September 2002 by Addison P. Phillips Director, Globalization Architecture c TABLE OF CONTENTS INTRODUCTION... 3 WHAT'S
More informationKazuraki : Under The Hood
Kazuraki : Under The Hood Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated Why Develop Kazuraki? To build excitement and awareness about OpenType Japanese fonts Kazuraki is the first
More informationPemrograman Dasar. Basic Elements Of Java
Pemrograman Dasar Basic Elements Of Java Compiling and Running a Java Application 2 Portable Java Application 3 Java Platform Platform: hardware or software environment in which a program runs. Oracle
More informationKeyboards for inputting Japanese language -A study based on US patents
Keyboards for inputting Japanese language -A study based on US patents Umakant Mishra Bangalore, India umakant@trizsite.tk http://umakant.trizsite.tk (This paper was published in April 2005 issue of TRIZsite
More informationBinary Representation
Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must tbe able to handle more than just values for real world problems
More informationUser Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. www.satoworldwide.com. Version 061030-02
Printing Unicode characters from SAP to SATO GT4xxe Printers User Guide Version 061030-02 2006 SATO Corporation. All rights reserved. Table of Contents 1. Introduction... 3 2. Configuration at SAP environment...
More informationInternationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla
Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease
More informationencoding compression encryption
encoding compression encryption ASCII utf-8 utf-16 zip mpeg jpeg AES RSA diffie-hellman Expressing characters... ASCII and Unicode, conventions of how characters are expressed in bits. ASCII (7 bits) -
More informationBinary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.
Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must be able to handle more than just values for real world problems
More informationAuthority file comparison rules Introduction
Authority file comparison rules Revised 2007-01-11; further revised 2009-04-01 (noted with red text) Note: This report was accepted and approved by the PCC Policy Committee in November 2007. Introduction
More informationHP Business Notebook Password Localization Guidelines V1.0
HP Business Notebook Password Localization Guidelines V1.0 November 2009 Table of Contents: 1. Introduction..2 2. Supported Platforms...2 3. Overview of Design...3 4. Supported Keyboard Layouts in Preboot
More informationThe Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationISO/IEC JTC1 SC2/WG2 N4399
ISO/IEC JTC1 SC2/WG2 N4399 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de rmalisation Международная организация по стандартизации
More informationL2/14-009 Abstract Introduction
P P T 0 1 S P P P P P P S P P P P P 0 S 1 1 S 0 0 1 P 0 S 1 T P 0 S 1 T 1 T P 0 S 1 T P 0 T P P P 0 1 S S 1 0 T P S P 1 0 T S P 0 1 P 0 S 1 T TPPT Form for PT ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY
More informationMulti-lingual Label Printing with Unicode
Multi-lingual Label Printing with Unicode White Paper Version 20100716 2009 SATO CORPORATION. All rights reserved. http://www.satoworldwide.com softwaresupport@satogbs.com 2009 SATO Corporation. All rights
More informationSession ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems
Session ID: SPC251 Unicode Interfaces Data Exchange Between Unicode and non-unicode Systems Dr. Christian Hansen, SAP AG Agenda Introduction About Code Pages Communication: The Ideal Picture Communication:
More informationDigital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding
Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding Published by National Electrical Manufacturers Association 1300 N. 17th Street Rosslyn, Virginia 22209 USA Copyright
More informationUnicode Enabling Java Web Applications
Internationalization Report: Unicode Enabling Java Web Applications From Browser to DB Provided by: LingoPort, Inc. 1734 Sumac Avenue Boulder, Colorado 80304 Tel: +1.303.444.8020 Fax: +1.303.484.2447 http://www.lingoport.com
More informationInternationalizing JavaScript Applications Norbert Lindenberg. Norbert Lindenberg 2013. All rights reserved.
Internationalizing JavaScript Applications Norbert Lindenberg Norbert Lindenberg 2013. All rights reserved. Agenda Unicode support Collation Number and date/time formatting Localizable resources Message
More informationCounting in base 10, 2 and 16
Counting in base 10, 2 and 16 1. Binary Numbers A super-important fact: (Nearly all) Computers store all information in the form of binary numbers. Numbers, characters, images, music files --- all of these
More informationIntroduction to Internationalized Domain Names (IDN)
Introduction to ized Domain Names (IDN) IP Symposium for CEE, CIS and Baltic States Moscow, Russia 16-19 September 2003 Robert Shaw ITU Internet Strategy and Policy Advisor Agenda
More informationCritical Values for I18n Testing. Tex Texin Chief Globalization Architect XenCraft
Critical Values for I18n Testing Tex Texin Chief Globalization Architect XenCraft Abstract In this session, we recommend specific data values that are likely to identify internationalization problems in
More informationDesigning Global Applications: Requirements and Challenges
Designing Global Applications: Requirements and Challenges Sourav Mazumder Abstract This paper explores various business drivers for globalization and examines the nature of globalization requirements
More informationThe Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationIDN: Challenges and Opportunities A registry s view of the multilingual web. Rome, March 2013!
IDN: Challenges and Opportunities A registry s view of the multilingual web " Rome, March 2013! Everything is about the end user! 2! Name! Deng Fu Xiang"! Occupation! Freelance photographer" " Age! 35
More informationHOW TO USE A KANJI DICTIONARY
HOW TO USE A KANJI DICTIONARY As I write this, in 2008, I fear that the skills I m about to describe will soon be who am I kidding? already are endangered species. That is to say, the way to use a traditional
More informationPoints to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese
General Format, Manner and Procedure for the Submission of Electronic Information under Law by virtue of the Electronic Transactions Ordinance (Chapter 553) Points to Note (This Note aims to set out the
More informationCommission on E-Business, IT and Telecoms Task Force on the Internet and IT Services
International Chamber of Commerce The world business organization Department of Policy and Business Practices Commission on E-Business, IT and Telecoms Task Force on the Internet and IT Services Issues
More informationASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives
146 CHAPTER 3 Information Representation The sign bit is 1, so the number is negative. Converting to decimal gives 37A (hex) = 134 (dec) Notice that the hexadecimal number is not written with a negative
More informationINTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP
INTERNATIONALIZATION FEATURES IN THE MICROSOFT.NET DEVELOPMENT PLATFORM AND WINDOWS 2000/XP Dr. William A. Newman, Texas A&M International University, wnewman@tamiu.edu Mr. Syed S. Ghaznavi, Texas A&M
More informationInternationalized Domain Names -
Internationalized Domain Names - Getting them to work Gihan Dias LK Domain Registry What is IDN? Originally DNS names were restricted to the characters a-z (letters), 0-9 (digits) and '-' (hyphen) (LDH)
More informationEmail Content Control. Admin Guide
Email Content Control Admin Guide Document Revision Date: May 7, 2013 Email Content Control Admin Guide i Contents Introduction... 1 About Content Control... 1 Configuration Overview for Content Control...
More informationUnicode in Mobile Phones
Unicode in Mobile Phones Adil Allawi Technical Director adil@diwan.com Unicode in Mobile Phones Complex text in small devices Memory is very limited Processing speed is low yet time is critical Screen
More informationASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot
ASCII Code Data coding Morse code was the first code used for long-distance communication. Samuel F.B. Morse invented it in 1844. This code is made up of dots and dashes (a sort of binary code). It was
More informationEcma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July 2013. The JSON Data Interchange Format. Reference number ECMA-123:2009
Ecma/TC39/2013/NN 4 th Draft ECMA-XXX 1 st Edition / July 2013 The JSON Data Interchange Format Reference number ECMA-123:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2013
More informationInternet Engineering Task Force (IETF) Request for Comments: 7790 Category: Informational. February 2016
Internet Engineering Task Force (IETF) Request for Comments: 7790 Category: Informational ISSN: 2070-1721 Y. Yoneya JPRS T. Nemoto Keio University February 2016 Mapping Characters for Classes of the Preparation,
More informationAnything goes (well, almost...)
Wilfred van Rooijen VOORJAAR 2008 3 Anything goes (well, almost...) Abstract This paper tries to illustrate some of the particularities of typesetting CJK characters using several flavors of LaTX. Special
More informationHow to be a CSI (encoding Crime Scene Investigator)
Objectives for Crime Scene Investigation How to be a CSI (encoding Crime Scene Investigator) ex exin Internationalization Architect Yahoo Inc. Have some fun Prevent death by bullet points Introduce strategies
More informationTop 10 Tips for Chinese Website Design
Automotive Health and Safety Public Sector Documentation Websites Technical Top 10 Tips for Chinese Website Design A short guide by Damian Scattergood STAR Technology Solutions, Docklands Innovation Park,
More informationUNDERSTANDING SMS: Practitioner s Basics
UNDERSTANDING SMS: Practitioner s Basics Michael Harrington, CFCE, EnCE It is estimated that in 2006, 72% of all mobile phone users world wide were active users of SMS or text messaging. In European countries
More informationVery often my clients ask me, Don I need Chinese translation. If I ask which Chinese? They just say Just Chinese. If I explain them there re more
Hello, fellow colleagues in Translation industry. And, Thank you very much for nice introduction. Vanessa. When you hear the topic Asian Languages and Markets, each of you probably had some questions or
More informationEncoding Text with a Small Alphabet
Chapter 2 Encoding Text with a Small Alphabet Given the nature of the Internet, we can break the process of understanding how information is transmitted into two components. First, we have to figure out
More informationIntro to Ancient Writing Systems
Cave painting from Lascaux, c. 15,000 10,000 B.C. Random placement and shifting scale signify prehistoric people s lack of structure and sequence in recording their experiences. Beautifully drawn but not
More informationWhisky Emoji Submission
Whisky Emoji Submission To: Unicode Technical Committee Date: 19th vember 2014 From: Tom Hoad Work Club 1. Introduction Following the 7.0 Unicode release, a community of whisky fans on Facebook and Twitter
More informationHow to represent characters?
Copyright Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information. How to represent characters?
More informationThe use of binary codes to represent characters
The use of binary codes to represent characters Teacher s Notes Lesson Plan x Length 60 mins Specification Link 2.1.4/hi Character Learning objective (a) Explain the use of binary codes to represent characters
More informationCherokee Language Technology Program - Education Services
Cherokee Language Technology Program - Education Services Cherokee Language Fonts and Keyboard Layouts Cherokee Font (Cherokee.ttf) The Cherokee language is compatible across many different platforms and
More informationMemory is implemented as an array of electronic switches
Memory Structure Memory is implemented as an array of electronic switches Each switch can be in one of two states 0 or 1, on or off, true or false, purple or gold, sitting or standing BInary digits (bits)
More informationHP Service Manager Compatibility Matrix
HP Service Manager Compatibility Matrix Software Version 9.21 January 12, 2011 Click one of the following links to see more detailed information. Tier Definitions Servers Applications Support Windows Client
More informationPreservation Handbook
Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc
More informationPROMOTION OF THE ARABIC DOMAIN NAME SYSTEM
Distr. LIMITED E/ESCWA/ICTD/2009/Technical Paper.1 31 December 2009 ORIGINAL: ENGLISH ECONOMIC AND SOCIAL COMMISSION FOR WESTERN ASIA (ESCWA) PROMOTION OF THE ARABIC DOMAIN NAME SYSTEM Request for Comments
More informationRight-to-Left Language Support in EMu
EMu Documentation Right-to-Left Language Support in EMu Document Version 1.1 EMu Version 4.0 www.kesoftware.com 2010 KE Software. All rights reserved. Contents SECTION 1 Overview 1 SECTION 2 Switching
More informationCyber Security Workshop Encryption Reference Manual
Cyber Security Workshop Encryption Reference Manual May 2015 Basic Concepts in Encoding and Encryption Binary Encoding Examples Encryption Cipher Examples 1 P a g e Encoding Concepts Binary Encoding Basics
More informationSection 1.4 Place Value Systems of Numeration in Other Bases
Section.4 Place Value Systems of Numeration in Other Bases Other Bases The Hindu-Arabic system that is used in most of the world today is a positional value system with a base of ten. The simplest reason
More informationThe future of International SEO. The future of Search Engine Optimization (SEO) for International Business
The future of International SEO The future of Search Engine Optimization (SEO) for International Business Whitepaper The World Wide Web is now allowing special characters in URLs which means crawlers now
More informationTable Of Contents. iii
PASSOLO Handbook Table Of Contents General... 1 Content Overview... 1 Typographic Conventions... 2 First Steps... 3 First steps... 3 The Welcome dialog... 3 User login... 4 PASSOLO Projects... 5 Overview...
More informationXML Character Encoding and Decoding
XML Character Encoding and Decoding January 2013 Table of Contents 1. Excellent quotes 2. Lots of character conversions taking place inside our computers and on the Web 3. Well-formedness error when encoding="..."
More informationWindows NLS Considerations
Radoslav Rusinov Radoslav.Rusinov.remove._.spam.@gmail.com Contents 1. Introduction... 3 1.1. Windows and Code Pages... 3 1.2. CharacterSet... 3 1.3. Encoding Scheme... 3 1.4. Fonts... 4 1.5. So Why Are
More informationRequest for Comments: 4627 Category: Informational July 2006. The application/json Media Type for JavaScript Object Notation (JSON)
Network Working Group D. Crockford Request for Comments: 4627 JSON.org Category: Informational July 2006 The application/json Media Type for JavaScript Object Notation (JSON) Status of This Memo This memo
More informationSoftware localization to China and Chinese
Export HIS project Localization Tekes grant nr 70062/04 Writer Hellevi Ruonamaa (hellevi.ruonamaa@uku.fi) Document status Finalized Date 8.1.2006 Software localization to China and Chinese Introduction...
More information1. Basic encoding principles
1 of 5 5/2/2006 11:41 AM ISO/IEC JTC1/SC2/WG2 N1636 DATE: 1997-08-25 DOC TYPE: Expert contribution TITLE: Encoding Egyptian Hieroglyphs in ISO/IEC 10646-2 SOURCE: Michael Everson PROJECT: JTC1.02.18.02
More informationMultilingual Ediscovery: Options, Obstacles and Opportunities Report
Multilingual Ediscovery: Options, Obstacles and Opportunities Report A guide to collecting, filtering, reviewing and producing multilingual documents in discovery. An Altegrity Company Copyright 2014 Kroll
More informationHow to translate your website. An overview of the steps to take if you are about to embark on a website localization project.
How to translate your website An overview of the steps to take if you are about to embark on a website localization project. Getting Started Translating websites can be an expensive and complex process.
More informationUnicode Support in Enterprise COBOL. Nick Tindall Stephen Miller Sam Horiguchi August 13, 2003
Unicode Support in Enterprise COBOL Nick Tindall Stephen Miller Sam Horiguchi August 13, 2003 What is Unicode?! Industry standard for coded character set - defined by Unicode Consortium and ISO! Covers
More informationDelphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines
Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines Cary Jensen, Jensen Data Systems, Inc. December 2009 (updated May 2011) Americas Headquarters EMEA Headquarters Asia-Pacific
More informationWebinar: Software & Mobile App Localization
Webinar: Software & Mobile App Localization Presented by: Jon Ritzdorf March 22, 2012 s2 Webinar Tips Minimize control panel To ask questions use Questions area Slide 2 s2 Only use this slide if we're
More informationInternationalization & Localization
Internationalization & Localization Of OpenOffice.org - The Indian Perspective Comprehensive Office Suite for Multilingual Indic Computing Bhupesh Koli, Shikha G Pillai
More informationThe Virtual Tibetan Classroom
The Virtual Tibetan Classroom by William Magee, DDBC Thanks to a Generous Grant from the Taiwan National Science Council and the Hopkins MultimediaTibetan Research Archive Project http://haa.ddbc.edu.tw
More informationThe Chinese Language and Language Planning in China. By Na Liu, Center for Applied Linguistics
The Chinese Language and Language Planning in China By Na Liu, Center for Applied Linguistics This brief introduces the Chinese language and its varieties and describes Chinese language planning initiatives
More informationCisco Unified Attendant Console Advanced Version 10.0
Data Sheet Cisco Unified Attendant Console Advanced Version 10.0 Product Overview Manage high volumes of calls from customers, employees, and business partners smoothly and efficiently. Cisco Unified Attendant
More informationInternationalization of Domain Names: A history of technology development
Internationalization of Domain Names: A history of technology development John C Klensin and Patrik Fältström First-generation Hostnames and Character Coding Consideration of internationalization issues
More informationBangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh
Bangla Localization of OpenOffice.org Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Localization L10n is the process of adapting the text and applications of a product or service to
More informationI. FOR STUDENTS WHO WANT TO CONTINUE A FOREIGN LANGUAGE:
R e c o m m e n d e d C o u r s e s f o r T H H S B r i d g e Y e a r S t u d e n t s The following is a list of Fall 2016 Queens College courses which are recommended for Townsend Harris seniors. For
More informationForeign Languages FOREIGN LANGUAGES. 2016-17 Sacramento City College Catalog. Degree: AA-T Spanish for Transfer
Foreign Languages Degree: AA-T Spanish for Transfer Division of Humanities and Fine Arts Chris Iwata, Dean Performing Arts Center 137 916-558-2551 Arabic ARABIC Chinese Cantonese-CANT French FREN Greek
More informationLocalization of Text Editor using Java Programming
Localization of Text Editor using Java Programming Varsha Tomar M.Tech Scholar Banasthali University Jaipur, India Manisha Bhatia Assistant Professor Banasthali University Jaipur, India ABSTRACT Software
More informationInternationalization of the Domain Name System: The Next Big Step in a Multilingual Internet
Internationalization of the Domain Name System: The Next Big Step in a Multilingual Internet Tan Tin Wee 1, James Seng 2, and S.Maniam 2 1 National University of Singapore, Singapore 119260 2 i-dns.net
More informationSMPP protocol analysis using Wireshark (SMS)
SMPP protocol analysis using Wireshark (SMS) Document Purpose Help analyzing SMPP traffic using Wireshark. Give hints about common caveats and oddities of the SMPP protocol and its implementations. Most
More informationQAD Business Intelligence Release Notes
QAD Business Intelligence Release Notes September 2008 These release notes include information about the latest QAD Business Intelligence (QAD BI) fixes and changes. These changes may affect the way you
More informationBAR CODE 39 ELFRING FONTS INC.
ELFRING FONTS INC. BAR CODE 39 This package includes 18 versions of a bar code 39 font in scalable TrueType and PostScript formats, a Windows utility, Bar39.exe, that helps you make bar codes, and Visual
More informationAnalyzing Unicode Text with Regular Expressions
Analyzing Unicode Text with Regular Expressions Andy Heninger IBM Corporation heninger@us.ibm.com Abstract For decades now, Regular Expressions have been used in the analysis of text data, for searching
More information