Frequently Asked Questions on character sets and languages in MT and MX free format fields



Similar documents
Symbols in subject lines. An in-depth look at symbols

dtsearch Frequently Asked Questions Version 3

URL encoding uses hex code prefixed by %. Quoted Printable encoding uses hex code prefixed by =.

7-Bit coded Character Set

DTRADE 2 Release Notes

ASCII CODES WITH GREEK CHARACTERS

HTML Codes - Characters and symbols

TABLE OF CONTENTS. Page 2

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

Chart of ASCII Codes for SEVIS Name Fields

Chapter 4: Computer Codes

ANZ TRANSACTIVE FILE FORMATS WEB ONLY Page 1 of 118

APPENDIX 1 BRAILLE SYMBOLS AND INDICATORS. Braille Characters Letters Numbers Contractions Indicators Punctuation and Symbols

Characters & Strings Lesson 1 Outline

ASCII control characters (character code 0-31)

Encoding Text with a Small Alphabet

Xerox Standard Accounting Import/Export User Information Customer Tip

The Unicode Standard Version 8.0 Core Specification

set in Options). Returns the cursor to its position prior to the Correct command.

Produce Traceability Initiative Best Practices for Formatting Case Labels

ASCII Character Set and Numeric Values The American Standard Code for Information Interchange

SWIFT MT940 MT942 formats for exporting data from OfficeNet Direct

NAB Connect Consolidated File Format Specifications

Ask your teacher about any which you aren t sure of, especially any differences.

Regular Expression Searching

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July The JSON Data Interchange Format. Reference number ECMA-123:2009

PCL PC -8. PCL Symbol Se t: 12G Unicode glyph correspondence table s.

Internationalized Domain Names -

Microsoft Access 2010 Part 1: Introduction to Access

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

EURESCOM - P923 (Babelweb) PIR.3.1

Regular Expressions. General Concepts About Regular Expressions

External Experts Portal

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

TELOCATOR ALPHANUMERIC PROTOCOL (TAP)

Create!form Barcodes. User Guide

Vodafone multitxt faqs

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows

Preservation Handbook

The future of International SEO. The future of Search Engine Optimization (SEO) for International Business

Data Integrator. Encoding Reference. Pervasive Software, Inc B Riata Trace Parkway Austin, Texas USA

Japanese Character Printers EPL2 Programming Manual Addendum

HTML Web Page That Shows Its Own Source Code

Thesis Format Guide. Denise Robertson Graduate School Office 138 Woodland Street Room

XML: extensible Markup Language. Anabel Fraga

Appendix report 1: Syntax and structure of EDIFACT and XML messages Regulation F1:

MONEY TRANSFER. Import & Approval User Guide

Electronic Data Interchange

P U B L I C R E L E A S E

SQL Server / Express 2008 Migration Frequently Asked Questions

Kiwi SyslogGen. A Freeware Syslog message generator for Windows. by SolarWinds, Inc.

Creating Basic Excel Formulas

Regular Expressions. In This Appendix

IBM Emulation Mode Printer Commands

12 SECOND QUARTER CLASS ASSIGNMENTS

SJC Password Self-Service System FAQ 2012

National Provider Identifiers Registry

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

National Provider Identifiers Registry

Using the Cisco Unity Connection Bulk Administration Tool

The Center for Teaching, Learning, & Technology

GNAT User s Guide for Native Platforms

How To Identify A Health Care Provider

Cover Page. Dynamic Server Pages Guide 10g Release 3 ( ) March 2007

Using Regular Expressions in Oracle

TAP Interface Specifications

Number Representation

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. Version

Variables, Constants, and Data Types

GENEVA COLLEGE INFORMATION TECHNOLOGY SERVICES. Password POLICY

National Provider Identifiers Registry

Avid Technology, Inc. inews NRCS. inews FTP Server Protocol Specification. Version January 2006

Sanitary Sewer Overflow/Facility Bypass Application

The use of binary codes to represent characters

National Provider Identifiers Registry

Inmagic DB/Text for SQL. Importer User s Guide

Administrative Services of Kansas

Format description XML SEPA Credit Transfer. Format Description

One Report, Many Languages: Using SAS Visual Analytics to Localize Your Reports

PRODUCT DATA IMPORT SETUP GUIDE FOR ADVERTISERS

National Provider Identifiers Registry

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

SMPP protocol analysis using Wireshark (SMS)

SEPA formats - an introduction to XML. version September

Transcription:

Frequently Asked Questions on character sets and languages in MT and MX free format fields Version Final 17 January 2008

Preface The Frequently Asked Questions (FAQs) on character sets and languages that follow apply only to free text fields. Other fields, like codes, identifiers, amounts and dates, have rules that define the limited set of values and/or characters that can be used. 20080117 - MX and MT Character Set and Language FAQ - Final.doc Page i

1 MT messages What character sets can I use in an MT message? All fields must use characters from the X, Y or Z character set (all three are provided in Appendix I), which are basically Latin letters, numbers from 0 to 9 and a small number of special characters and other symbols. It is important, incidentally, to distinguish character sets from languages, as there is no unique relationship between them. What are the X, Y and Z character sets? The X-character set is based on the set of characters that can be transported over telex. The Y-character set is equivalent to the EDIFACT level A character set. The Z-character set is a combination of the X- and Y-character set, plus @ and #. What language can I use in an MT message? It is strongly recommended to use English in MTs. A Closed User Group (CUG) or another bilateral/multilateral agreement, typically for domestic traffic can use a different language, but this should be agreed explicitly. How do I fill in a word or name using a different character set in an MT? A number of countries have created a tactical work-around in the form of transliteration tables between local character sets and the X, Y or Z character set. Cyrillic in Russia 1 and Ukraine and the four corners character encoding in China 2 are three examples. A more general broadcast [ref. 4268] was sent out by SWIFT on 18 October 2000 regarding the transliteration of some missing EBCDIC characters. Remember, this applies only to free text fields, not to coded or numeric fields. Is there any chance that MT messages will accept non-latin characters in the future? SWIFT does not expect this to happen. What if I really need to use the actual foreign characters assuming the back office systems on both sides can handle them? This cannot be done using MTs if the foreign characters are not in the X, Y or Z character set. What should I do with special characters that are not part of character sets X, Y or Z? To specify a character that is not part of the X, Y or Z character set in a SWIFT MT message, SWIFT recommends using the character s hexadecimal EBCDIC code, preceded by two question marks (??) as escape sequence. Some examples are provided in Appendix III. 1 Examples of usage can be found in the latest versions of the RUR and RUS market practices on the Russian SWIFT association s website, www.swift.ru. 2 Known to be used by various Chinese users of MT messages. A description of the encoding method can be found here, and the transliteration tables themselves can be found here. 20080117 - MX and MT Character Set and Language FAQ - Final.doc Page 1

2 MX messages What character sets can I use in an MX message? The default character set for MX is Basic Latin (see Appendix II), except if a Closed User Group (CUG) or another bilateral/multilateral agreement - is set up in which other principles are agreed. If the service description of a CUG does not specify a character set, it means only Basic Latin can be used within that CUG. If other characters need to be transported (e.g., Cyrillic), then this must be explicitly mentioned within the Service Description of that CUG. But I thought UNICODE / UTF-8 was the ISO 20022 (UNIFI) standard s character set? The UNICODE character set, encoded in UTF-8, is indeed the official ISO 20022 character set. Appendix IV provides some technical details about UTF 8 encoding and UNICODE. However, SWIFT added a rule to restrict the set of allowed characters for free text XML elements to Basic Latin (see above). This will protect customer applications from receiving unwanted characters and character sets. The SWIFT network will not validate this rule, making it possible to have a bilateral, multilateral or CUG-wide agreement to ignore the rule and to use other characters and character sets in domestic and intra-company communications. What language can I use in an MX message? English is strongly recommended to be used in MXs. A Closed User Group (CUG) or another bilateral/multilateral agreement, typically for domestic traffic can use a different language, but this should be agreed explicitly. 20080117 - MX and MT Character Set and Language FAQ - Final.doc Page 2

APPENDIX I - MT messages: the X, Y, Z character sets The table below lists the characters used in the three character sets that can be used in MTs. X Y Z Character Description * * a - z 26 small characters of the Latin alphabet * * * A Z 26 capital characters of the Latin alphabet * * * 0-9 10 numeric characters * * * / Solidus (slash) * * * - Hyphen * * *? Question mark * * * : Colon * * * ( Opening parenthesis * * * ) Closing parenthesis * * *. Full stop * * *, Comma * * * ' Apostrophe * * * + Plus * * * Space * * = Equal to * *! Exclamation mark * * " Quotation mark * * % Percentage * * & Ampersand * * * Asterisk * * < Less than * * > Greater than * * ; Semi-colon * @ At * # Pound (hash) * * { Opening curly bracket * } Closing curly bracket * * CR Carriage return * * LF Line feed 20080117 - MX and MT Character Set and Language FAQ - Final.doc Page 3

APPENDIX II - MX messages: the Basic Latin character set The Basic Latin code chart contains a number of characters not included in the X, Y or Z sets. The list below shows both the characters common to MT and MX message character sets and - in italics - the most relevant characters specific to the Basic Latin set. Basic Latin Character Description * a - z 26 small characters of the Latin alphabet * A Z 26 capital characters of the Latin alphabet * 0-9 10 numeric characters * / Solidus (slash) * - Hyphen *? Question mark * : Colon * ( Opening parenthesis * ) Closing parenthesis *. Full stop *, Comma * ' Apostrophe * + Plus * Space * = Equal to *! Exclamation mark * " Quotation mark * % Percentage * & Ampersand * * Asterisk * < Less than * > Greater than * ; Semi-colon * @ At * # Pound (hash) * $ Dollar * { Opening curly bracket * } Closing curly bracket * CR Carriage return * LF Line feed 20080117 - MX and MT Character Set and Language FAQ - Final.doc Page 4

Basic Latin Character Description * [ Left square bracket * ] Right square bracket * \ Reverse solidus (backslash) * _ Low line (underscore) * ^ Circumflex * ` Grave accent * Vertical line * ~ Tilde * A set of control characters 20080117 - MX and MT Character Set and Language FAQ - Final.doc Page 5

APPENDIX III - MT messages: Work-around for characters not included in X, Y or Z character set To specify a special character that is not part of the X, Y or Z character set in a SWIFT MT message, SWIFT recommends using the character s hexadecimal EBCDIC code, preceded by two question marks (??) as escape sequence. Some examples are provided in the table below. Character (some examples) @ _ Work-around (including escape sequence)??7c??6d &??50 %??6C =??7E *??5C #??7B $??5B!??5A ^??5F ~??A1 APPENDIX IV - The UNICODE character set UNICODE and UCS (ISO10646) are the names of two character standards sets, i.e., tables that link characters to a number. They are in fact supersets of all commonly known character tables, like ASCII and EBCDIC. As of 1993, UNICODE (version1.1) and UCS (ISO10646-1:1993) have been kept compatible. UNICODE 4.0 covers a set of 144 character code charts. Each of these code charts is fully specified and explained in UNICODE. A well-known example is the Basic Latin code chart. UNICODE also offers code charts that cover Cyrillic, Hindi, Japanese, Chinese, Malay, Thai, Arabic, mathematical representations (e.g., letters with arrows on top), artistic languages (e.g., Tolkien), etc. The UTF-8 encoding scheme is a technical implementation in XML schemas that allows specification of any character used anywhere. The scheme bases itself on UNICODE and Universal Character Set (ISO10646). The UTF-8 encoding scheme allows representation of any UNICODE character (or any character of the Universal Character Set) as a sequence of bytes. 20080117 - MX and MT Character Set and Language FAQ - Final.doc Page 6