Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

Similar documents
Internationalization of Domain Names

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows

NAME. Internationalized Domain Names (IDNs) -.IN Domain Registry. Policy Framework. Implementation

Internationalized Domain Names -

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business

Internationalization of the Domain Name System: The Next Big Step in a Multilingual Internet

MULTILINGUILIZATION STANDARD. Wael Nasr Director, I-DNS.Net

.ASIA CJK (Chinese Japanese Korean) IDN Policies

Agenda. Network Services. Domain Names. Domain Name. Domain Names Domain Name System Internationalized Domain Names. Domain Names & DNS

Draft WGIG Issue Paper on the Multilingualization of

The Unicode Standard Version 8.0 Core Specification

IDN Server Proxy Architecture for Internationalized Domain Name Resolution and Experiences with Providing Web Services

Arabic Domain Names. Dr. Abdulaziz H. Al-Zoman Director of SaudiNIC Chairman of Steering Committee - ADN Pilot Project zoman@isu.net.sa.

Chapter 4: Computer Codes

Global Registry Services Registrar Frequently Asked Questions (FAQ) for TLDs using Afilias Technology

Distributed Systems. 09. Naming. Paul Krzyzanowski. Rutgers University. Fall 2015

<.bloomberg> gtld Registration Policies

Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels

.AXA Domain Policy. As of March 3, 2014

Radix Reserved Names Policy

IDN FREQUENTLY ASKED QUESTIONS

Designing Global Applications: Requirements and Challenges

IANA Functions to cctlds Sofia, Bulgaria September 2008

SUMMARY PRINCIPLES, RECOMMENDATIONS & IMPLEMENTATION GUIDELINES

Character Code Structure and Extension Techniques

.ASIA Reserved Names Policies

Glossary of Technical Terms Related to IPv6

New gtld Basics New Internet Extensions

PROMOTION OF THE ARABIC DOMAIN NAME SYSTEM

Product Internationalization of a Document Management System

XML. CIS-3152, Spring 2013 Peter C. Chapin

Chapter 5. Data Communication And Internet Technology

.hitachi Domain Name Registration Policies

Communicating access and usage policies to crawlers using extensions to the Robots Exclusion Protocol Part 1: Extension of robots.

Chapter 23 The Domain Name System (DNS)

INTERNET DOMAIN NAME SYSTEM

EURESCOM - P923 (Babelweb) PIR.3.1

Right-to-Left Language Support in EMu

3. The Domain Name Service

Domain Name System Richard T. B. Ma

The Domain Name System (DNS)

The Proposal for Internationalizing cctld Names

An introduction to IANA Presentation Notes

NASK.PL Registry & Registrars. partner@dns.pl

Technical Integration Guide

Internetworking with TCP/IP Unit 10. Domain Name System

Māori Language Domain Names

Domain Name Registration Policies (Version 1.1 June 10, 2014)

Phishing by data URI

Encoding Text with a Small Alphabet

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

ICANN Synthesis on Single-Character Domain Names at the Second-Level

Policy Overview and Definitions

IDN and SSL certificates

Lecture 2. Internet: who talks with whom?

FAQ (Frequently Asked Questions)

Multi-lingual Label Printing with Unicode

Lecture 2 CS An example of a middleware service: DNS Domain Name System

GENERAL* POLICY OF AKKY S DOMAIN NAMES. Policy to be enforced as from May 5 th, 2012.

THE DOMAIN NAME SYSTEM DNS

The Use of DNS Resource Records

Names & Addresses. Names & Addresses. Names vs. Addresses. Identity. Names vs. Addresses. CS 194: Distributed Systems: Naming

One Report, Many Languages: Using SAS Visual Analytics to Localize Your Reports

Simple Network Management Protocol

INTERNATIONAL TELECOMMUNICATION UNION

Translating QueueMetrics into a new language

.taipei Domain Registration Policy

03 Internet Addressing

Computer Networks: Domain Name System

Teldat Router. DNS Client

Number Representation

web hosting and domain names

INTERNATIONAL TELECOMMUNICATION UNION

Transcription:

Internationalizing the Domain Name System Šimon Hochla, Anisa Azis, Fara Nabilla

Internationalize Internet Master in Innovation and Research in Informatics problematic of using non-ascii characters ease of use to have local linguistic implemented in Internet (suit users needs) Solution: localization: adapting local computing environment to suit local linguistic needs But localization doesn t appear as a compelling solution in multilingual environment for DNS, why? DNS binds all users language symbols together. DNS spans the entire network DNS means to allow the use of all those language symbols within the same system (internationalization) Autumn 2015 2 Computer Networks (MIRI)

Situation of the current DNS DNS is the most common means of initiating a network transaction, whether it is a BitTorrent session, The Web, e-mail or any other form of network activity. Assumption : DNS is often sequence of words or abbreviation in English and using ASCII character set Can DNS support non western script and diacritics? Some implementations of DNS don't support any other characters than ASCII. Use of acute, grave symbols, umlauts and similar marks can provide unwanted results. DNS resolver doesn t recognize non-ascii character, unicode URL needs to be transformed/encoded into DNS LDH. Autumn 2015 Computer Networks (MIRI) 3

Multilingual Characters we have managed to get the non-latin-based scripts into many applications and also can enter non-latin characters on computer keyboards What does it mean Internationalizing DNS? where Latin and English character is used and a communication is initiated in one locale and then the language and presentation are preserved wherever the communication is received Autumn 2015 Computer Networks (MIRI) 4

Terminology Master in Innovation and Research in Informatics Language: A language uses characters drawn from a collection of scripts. Script: A script is a collection of characters that are related in their use by a language. Character: A character is a unit of a script. Glyph: The presentation of a character within the style of a font is called a glyph. Font: A font is a collection of glyphs encompassing a script character set that share a consistent presentation style. Autumn 2015 Computer Networks (MIRI) 5

Unicode Master in Innovation and Research in Informatics What s the objective of Internationalizing the DNS? DNS can support the union of all character sets, while avoiding ambiguity and uncertainty in terms of resolution of any individual DNS name Solution : Unicode - "universal characters set" universal encoding of characters in the contexts of all scripts and all languages Autumn 2015 Computer Networks (MIRI) 6

Unicode representations Unicode can be represented in multiple ways by using different character encoding schemes in a Unicode Transformation Format (UTF). Most common are utf-8 and utf-16. UTF-8, UTF-16 variable-length, UTF-32 fixed-length UTF-16 characters that don't belong to the basic multilingual plane are mapped by into a pair of 16-bit words criticism for penalisation of certain scripts by requiring more bytes to represent their code ponts Autumn 2015 Computer Networks (MIRI) 7

Context of a script and a language Unicode - weaknesses in terms of identifying a context of a script and a language for a given character sequence Solution: tag the content with the script and encoding scheme Tagging is useful for e-mail and web page content, but breaks down in the context of the DNS, why? "universal" character set and a "universal" language context. No natural space in DNS names to contain tags DNS must have implicit tags of all characters and all languages Autumn 2015 Computer Networks (MIRI) 8

DNS 7-bit ASCII vs Unicode 8-bit clean 8-bit clean: computer system that correctly handles 8-bit character encodings, such as the ISO 8859 series and the UTF-8 encoding of Unicode The Unicode UTF-8, UTF-16, and UTF-32 encodings all require an 8- bit clean storage and transmission medium. traditional DNS domain names are representable with 7-bit ASCII characters IETF s IDN Working Group decided to move towards application assistance instead of the DNS supporting non ASCII characters Why is DNS domain names are representable with 7-bit ASCII and not with 8-bit clean? LDH restriction applied on DNS domain names Autumn 2015 Computer Networks (MIRI) 9

LDH convention RFC 1035: 1. Each DNS label must begin with a letter, restricted to the Latin character subset of A through Z and a through z, followed by a sequence of letters, digits, or hyphens, with a trailing letter or digit, and no trailing hyphen. 2. The case of the letter is not important to the DNS, so, within the DNS a is equivalent to A, and so on (monocase character) 3. DNS uses a left-to-right ordering of these labels, with the ASCII period as the label delimiter. Autumn 2015 Computer Networks (MIRI) 10

Internationalisation Master in Innovation and Research in Informatics Allow DNS to be set in the user s own language, and at the same time allow the DNS to operate in a consistent and deterministic manner within its restricted language 2 options: make DNS 8-bit clean or applications have to do the work and present to the DNS an encoded form of the Unicode sequences that conform to the restricted DNS character repertoire Autumn 2015 Computer Networks (MIRI) 11

IDN framework Master in Innovation and Research in Informatics IDN Working Group of the IETF formed in 2000 wit the goal of developing standards to internationalize domain names. Outcome is the IDNA framework. ASCII Compatible Encoding (ACE): Unicode strings of IDNs into ASCII character encoding IETF adopted punycode as its standards IDN ACE Autumn 2015 Computer Networks (MIRI) 12

IDN An internationalized domain name (IDN) is an Internet domain name that contains at least one label that is displayed in software applications, in a language-specific script or alphabet. These writing systems are encoded by computers in multi-byte Unicode. Autumn 2015 Computer Networks (MIRI) 13

Punycode way to represent Unicode with the limited character subset of ASCII supported by the Domain Name System e.g. "München" (German name for the city of Munich) would be encoded as "Mnchen-3ya". RFC 3454[6] defines a presentation layer in IDNaware applications that is responsible for the punycode ACE encoding and decoding. Autumn 2015 Computer Networks (MIRI) 14

IDN in Applications Master in Innovation and Research in Informatics Role of aplication in IDN: transform the domain name expressed in a particular language using a particular script into ASCII-compatible LDH-encoded string and reverse critical: encoding and decoding function works correctly, deterministically, and uniformly DNS stores encoded version of the canonical name DNS is deterministic, does not return a set of possible answers to a query, we cannot use approximation Autumn 2015 Computer Networks (MIRI) 15

The Presentation Layer Transform for IDNs algorithm groups "equivalent" unicode strings from the DNS LDH string into the unicode string single "canonical" string from the group of possible IDN strings selected Stringprep: original unicode string (numerous transformations applied)-> regular or canonical form of the IDN string (transformation using the punycode ACE) - > encoded DNS string Autumn 2015 Computer Networks (MIRI) 16

Transformation Mapping: converting a string to a normal, or canonical, form transforms to lower case and removes characters without semantic meaning that do not affect the equivalence Normalisation: many languages use different character sequences for the same meaning e.g. letter Ä : LATIN CAPITAL A WITH DIARESIS, LATIN CAPITAL LETTER followed by COMBINING DIARESIS Autumn 2015 Computer Networks (MIRI) 17

Nameprep: A Stringprep Profile for the DNS specifies stringprep for internationalized domain names, specifying a character repertoire and a profile of mappings, normalization (form KC ), prohibited characters, and bidirectional character handling Autumn 2015 Computer Networks (MIRI) 18

The Punycode ASCII-Compatible Encoding transformation from the canonical form of the Unicode name string into a LDH-equivalent string using an ACE Algorithm: division into basic and exteded points literal reproduction of the basic points goes first A delimiter is added (a basic code point that does not occur in the remainder of the string) The extended code points added to the string as a series of integers expressed through an encoding into the basic (LDH) code set Autumn 2015 Computer Networks (MIRI) 19

Example of Punycode Master in Innovation and Research in Informatics bücher The encoded form for ü (code 252) delta code of 745 in base 35 expressed as (21 x 35) + 10, (10,22,1) in reverse notation kva. xn--bcher-kva Autumn 2015 Computer Networks (MIRI) 20

Homoglyphs Distinct characters not necessarily displayed in unique ways www.paypal.com vs www.paypal.com the domain name www.paypal.com resolved in the DNS as www.paypal.com, in the second case www.paypal.com translated to www.xn--pypal-4ve.com no clear relationship between characters and glyphs multiple characters - single glyph, e.g. pair f l displayed as the single glyph fl, single character - multiple glyphs. Autumn 2015 Computer Networks (MIRI) 21

Homoglyphs cont. Two unequal strings can be indistinguishable from the point of view Browsers: first response: disable the IDN support second response: expose the punycode version of the URL most popular browsers display the glyphs rather than the ASCII punycode Autumn 2015 Computer Networks (MIRI) 22

Ambiguity The intention in the IDN effort was to preserve the deterministic property of DNS resolution, but it did not quite manage to reach goal. Languages are human-use systems resistant to automated processing. Language and script context are needed to resolve the homoglyphs refined definition of IDN labels that lists which Unicode code points can be used in the context of IDNs, excluding all others Autumn 2015 Computer Networks (MIRI) 23

What about putting IDN codes into the root of the DNS as alternative top-level domains (TLDs)? natural extension of adding punycode-encoded name entries into lower levels of the DNS allow any DNS name to be wholly expressible in the user s language, implying that all parts of the DNS should be able to carry native languageencoded DNS names Autumn 2015 Computer Networks (MIRI) 24

Multilingual equivalents of protocol identifier codes The multilingual presentation of these elements is on the application, rather than attempting to alter the protocol identifiers in the relevant standards Autumn 2015 Computer Networks (MIRI) 25

Internationalisation of TLD Equivalence of the TLD (top level domain) when it is in the ASCII format and the punycode:.jp vs.xn--wgv71a should they be in the same or in a different DNS zone? precisetly the same subdomain name set registration in one of these equivalence names is in effect a name registration across the entire equivalence set multiligual should not only multiscript:.com in German represented as.kom Autumn 2015 Computer Networks (MIRI) 26

DNAME Record of TLD name aliases for their ASCII equivalents DNAME places load back on the name servers - still in early development locks up IDNs into the hands of the TLD name-registry operators single registrar with each IDN variant of the same TLD, competition between the various registrars - may do more harm than good for Internet users DNS top-level name space is very conservatively managed, and new entries into this space are not made lightly Autumn 2015 Computer Networks (MIRI) 27

TLD and the presentation layer presentation layer could perform also the mapping the punycode ACE equivalents of the TLDs to the actual ASCII TLDs as 日 本 into xn--wgv71a xn--wgv71a into jp Autumn 2015 Computer Networks (MIRI) 28

Conclusion ICANN in a challenging situation, many people point that it ignores noenglish languages because of its political bias overwhelming majority of Internet users and commercial activity of the Internet is in languages other than native English, ASCII only in DNS is unnatural when making changes to dns we need to think long term and not to do ad hoc decisions as it could eventually cause a fragmented internet internationalisation of DNS is necessary, we need both internationalisation and localisation What causes a user the least amount of surprise? Autumn 2015 Computer Networks (MIRI) 29

Questions to Discuss Why is DNS such a heavily restricted language? Is it better to have IDN for TLD?(IDN in DNS root)? Autumn 2015 Computer Networks (MIRI) 30