Unicode Support in Enterprise COBOL. Nick Tindall Stephen Miller Sam Horiguchi August 13, 2003

Similar documents

Chapter 4: Computer Codes

COWLEY COLLEGE & Area Vocational Technical School

ERserver. DB2 Universal Database for iseries SQL Programming with Host Languages. iseries. Version 5

Preservation Handbook

Japanese Character Printers EPL2 Programming Manual Addendum

MOC 20461C: Querying Microsoft SQL Server. Course Overview

SQL Server An Overview

ERserver. iseries. DB2 Universal Database for iseries SQL Programming with Host Languages

Database Programming with PL/SQL: Learning Objectives

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Lecture 5: Java Fundamentals III

Section of DBMS Selection & Evaluation Questionnaire

Database DB2 Universal Database for iseries Embedded SQL programming

The Unicode Standard Version 8.0 Core Specification

Product Internationalization of a Document Management System

Data Integrator. Encoding Reference. Pervasive Software, Inc B Riata Trace Parkway Austin, Texas USA

Moving from CS 61A Scheme to CS 61B Java

4D v11 SQL Release 1 (11.1) ADDENDUM

How to Improve Database Connectivity With the Data Tools Platform. John Graham (Sybase Data Tooling) Brian Payton (IBM Information Management)

Lesson 4 Web Service Interface Definition (Part I)

The use of binary codes to represent characters

Multi-lingual Label Printing with Unicode

How Strings are Stored. Searching Text. Setting. ANSI_PADDING Setting

Mail 2 ZOS FTPSweeper

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

Java Interview Questions and Answers

Data Tool Platform SQL Development Tools

Frequently Asked Questions on character sets and languages in MT and MX free format fields

HP Service Virtualization

Number Representation

Cobol. By: Steven Conner. COBOL, COmmon Business Oriented Language, one of the. oldest programming languages, was designed in the last six

How To Write Portable Programs In C

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

ERserver. Embedded SQL programming. iseries. Version 5 Release 3

Application Development Guide: Programming Server Applications

XML. CIS-3152, Spring 2013 Peter C. Chapin

Extracting, Storing And Viewing The Data From Dicom Files

User Guide. Printing Unicode characters from SAP to SATO GT4xxe Printers. Version

Variables, Constants, and Data Types

XML Processing and Web Services. Chapter 17

The C Programming Language course syllabus associate level

Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC

Continuous Integration Part 2

Java CPD (I) Frans Coenen Department of Computer Science

Embedding SQL in High Level Language Programs

Unicode Enabling Java Web Applications

Introduction to Java

Embedded Special Characters Kiran Karidi, Mahipal Vanam, and Sridhar Dodlapati

PART-A Questions. 2. How does an enumerated statement differ from a typedef statement?

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

FileMaker Server 9. Custom Web Publishing with PHP

Chapter 2: Elements of Java

Oracle Database 11g Express Edition PL/SQL and Database Administration Concepts -II

3.GETTING STARTED WITH ORACLE8i

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

Binary Representation

DataDirect XQuery Technical Overview

Translating QueueMetrics into a new language

Eventia Log Parsing Editor 1.0 Administration Guide

Computing Concepts with Java Essentials

Introduction to Java. CS 3: Computer Programming in Java

Oracle Database: SQL and PL/SQL Fundamentals

Using SQL in RPG Programs: An Introduction

How To Create A Table In Sql (Ahem)

For OS/390, VM, VSE. Extended Reporting Facility Guide 6.2 SP3

STUDY GUIDE CHAPTER 1

Oracle Database: SQL and PL/SQL Fundamentals

PL / SQL Basics. Chapter 3

Computers. An Introduction to Programming with Python. Programming Languages. Programs and Programming. CCHSG Visit June Dr.-Ing.

IBM. REXX/400 Programmer s Guide. AS/400 Advanced Series. Version 4 SC

A Brief Introduction to MySQL

Adding WebLogic Logging Services to Applications Deployed on Oracle WebLogic Server c (12.1.3)

Automating SQL Injection Exploits

Nesstar Server Nesstar WebView Version 3.5

2 SQL in iseries Navigator

A Case Study of ebay UTF-8 Database Migration

The Java Series. Java Essentials I What is Java? Basic Language Constructs. Java Essentials I. What is Java?. Basic Language Constructs Slide 1

Terms and Definitions for CMS Administrators, Architects, and Developers

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

Preservation Handbook

ASSEMBLY LANGUAGE PROGRAMMING (6800) (R. Horvath, Introduction to Microprocessors, Chapter 6)

Java 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner

Generalizing Overloading for C++2000

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Oracle WebLogic Server

HOMEWORK # 2 SOLUTIO

Yarmouk University Faculty of Science and Information Technology Department of Computer Information Systems CIS 282 Developing Web Applications

Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing

Solution for Homework 2

Data Transfer Tips and Techniques

Java SE 8 Programming

Improvement of Software Quality and Productivity Using Development Tools

I PUC - Computer Science. Practical s Syllabus. Contents

Encoding Text with a Small Alphabet

Transcription:

Unicode Support in Enterprise COBOL Nick Tindall Stephen Miller Sam Horiguchi August 13, 2003

What is Unicode?! Industry standard for coded character set - defined by Unicode Consortium and ISO! Covers all commonly used characters in the world in one code page (vs. one "language" per ASCII, EBCDIC, or EUC code page)! Characters: text, digits, special characters, symbols, control characters,...! Multiple Unicode encoding formats: UTF-8, UTF-16, UTF-32! "Stateless" encoding: meaning of an encoding unit is self defining 2

Why Unicode?!Support global e-business environment: ƒapplications for multi-cultural/multigeographic businesses ƒnetworks of heterogeneous systems!enables a common implementation for global applications vs. separate code page for each geographic area or system platform!supported by all key operating system and middleware platforms!required by: XML, HTML, Java,... 3

UTF-8 and UTF-16 Encoding!UTF-8 encoding unit: one byte ƒ character : 1 to 4 encoding units!utf-16 encoding unit: two bytes ƒ character : 1 or 2 encoding units ƒ All characters defined in commonly used EBCDIC and ASCII code pages represented in one encoding unit!characters in Latin-1 (Western European) ASCII code page represented consistently ƒ "A" = X"41" in UTF-8 or ASCII, ƒ "A" = X"0041" in UTF-16 4

Unicode support in Enterprise COBOL for z/os and OS/390! Enable basic Unicode processing for COBOL applications! Consistent with COBOL 2002 standard! Interoperate with: ƒ DB2 Unicode support ƒ Java ƒ COBOL XML support 5

Unicode support overview! Unicode literal and value clause! Unicode data type! New compiler options ƒ CODEPAGE() ƒ NSYMBOL()! Collation: binary! Implicit conversions for EBCDIC data assigned to or compared with Unicode data! Explicit conversions via intrinsic functions 6

Unicode literals! N'αβγ', ', N'Stra Straße' ƒ Literal value in source is encoded in some EBCDIC code page ƒ Value is converted to UTF-16 for execution ƒ Value limited to those representable by the source program code page! NX'03B103B203B3' ƒcan be used for characters - not supported by editor, or - not in code page of source program 7

Unicode data type! USAGE NATIONAL, Picture character N 01 Japan pic N(20) usage national value N' 日本 '.! One UTF-16 encoding unit (2 bytes) per PICTURE N character! "Character" defined in terms of PICTURE symbol positions, for reference modification, character counts, etc. 8

Compiler options! CODEPAGE ( ccsid ) ƒ Specifies EBCDIC CCSID for: - literals in source program - contents of alphanumeric and DBCS data items ƒ Shipped default is 1140 (Latin-1 with Euro)! NSYMBOL (DBCS NATIONAL) ƒ 01 X PIC NN. and N' ' are ambiguous: Unicode or DBCS? ƒ NSYMBOL option controls default interpretation ƒ Note: PICTURE G and G'...' are treated as DBCS regardless of NSYMBOL 9

Assignment! NATIONAL, DISPLAY or DISPLAY-1 item may be assigned to NATIONAL item 01 Country pic N(20) usage national. 01 USA pic X(13) value 'United States'. 01 Greece pic X(6) value 'Ελλάδα'. Move USA to Country. Move Greece to Country.! Numeric integer may be assigned to NATIONAL! Padding with Unicode space character: X'0020'! Truncation by 2-byte encoding units ƒ Application logic responsible for avoiding partial character truncation when dealing with characters represented in two encoding units 10

Unicode Compares! National item may be compared with: national, alphanumeric, DBCS, or numeric integer operand. If Country = N' 日本 '! Non-Unicode operand converted to Unicode! Shorter operand value padded with Unicode blanks! Byte for byte compare in binary order ƒ No culturally sensitive compares - e.g. N'ç' is not equal N'c', regardless of locale ƒ No normalization - e.g. á (composed) is not equal to a (decomposed) 11

Other language syntax supporting Unicode! Statements involving comparisons ƒ EVALUATE, IF, INSPECT, PERFORM ƒ SEARCH, STRING, UNSTRING ƒ SORT, MERGE, Indexed file keys! Class condition on Unicode data ƒ NUMERIC, ALPHABETIC, ALPHABETIC-LOWER, ALPHABETIC-UPPER, class-name! Unicode arguments for CALL or INVOKE! INITIALIZE... REPLACING NATIONAL...! Reference modification 12

Intrinsic conversion functions " FUNCTION DISPLAY-OF(national-data [ccsid]) ƒ returns alphanumeric (EBCDIC) representation of NATIONAL argument. " FUNCTION NATIONAL-OF(ebcdic-data [ccsid]) ƒ returns UTF-16 representation of EBCDIC (DISPLAY or DISPLAY-1) argument. " If ccsid omitted, defaults to value from CODEPAGE() compiler option " ccsid may represent an EBCDIC, ASCII, EUC or UTF-8 code page Recommendation: use only one EBCDIC code page in a program. 13

Converting to Unicode 01 Unicode-Data pic N(20) usage national. 01 EBCDIC-Data pic X(20). 01 Greek-Data pic X(20). 01 UTF8-Data pic X(20). 01 Japanese-Data pic G(20) usage display-1. 1) Move EBCDIC-Data to Unicode-Data 2) Move function National-of(EBCDIC-Data) to Unicode-data 3) Move function National-of (Greek-Data, 4971) to Unicode-Data 4) Move function National-of (UTF8-Data, 1208) to Unicode-Data 5) Move function National-of (Japanese-Data, 1399) to Unicode-Data 1, 2) Converts EBCDIC-data represented in CCSID in effect via CODEPAGE compiler option to UTF-16 3) Converts EBCDIC Greek (CCSID 4971)data to UTF-16 4) Converts UTF-8 (CCSID 1208)data to UTF-16 5) Converts EBCDIC Japanese (CCSID 1399)data to UTF-16 14

Converting from Unicode 01 Unicode-Data pic N(20) usage national. 01 EBCDIC-Data pic X(20). 01 Greek-Data pic X(20). 01 UTF8-Data pic X(20). 01 Japanese-Data pic G(20) Usage Display-1. 1) Move function Display-of (Unicode-Data) to EBCDIC-DATA 2) Move function Display-of (Unicode-Data, 4971) to Greek-Data 3) Move function Display-of (Unicode-Data, 1208) to UTF8-Data 4) Move function Display-of (Unicode-Data, 1399) to Japanese-Data 1) Converts UTF-16 (CCSID 1200) to EBCDIC-data represented in CCSID in effect via CODEPAGE compiler option 2) Converts UTF-16 to EBCDIC Greek (CCSID 4971) 3) Converts UTF-16 to UTF-8 (CCSID 1208) 4) Converts UTF-16 to EBCDIC Japanese (CCSID 1399) 15

ACCEPT and DISPLAY and! ACCEPT national-data FROM CONSOLE ƒ input data implicitly converted from EBCDIC to Unicode! DISPLAY national-data UPON CONSOLE ƒ output data implicitly converted from Unicode to EBCDIC! ACCEPT or DISPLAY from/to devices other than CONSOLE done without implicit conversion ƒ Use DISPLAY-OF function to control conversion: Display function display-of(country, 930) 16

Unicode support in DB2 (V7 ( and later) V7 and later)!ebcdic, ASCII and Unicode data types!utf-8 & UTF-16 for Unicode!Stored data representation at table space level!host variables declared as EBCDIC, ASCII or Unicode ƒsbcs mapped to UTF-8 ƒdbcs mapped to UTF-16!Automatic conversion between stored representation and host variable declarations!collation order: binary 17

Using Unicode in DB2 COBOL programs Consistent support for Unicode in DB2 and COBOL ƒ Same Unicode conversion facility ƒ Binary collation With DB2 coprocessor (SQL compiler option) CCSID information for NATIONAL, DISPLAY, or DISPLAY-1 host variables is automatically coordinated between COBOL and DB2. e.g. EXEC SQL DECLARE :X VARIABLE CCSID 1140 END-EXEC is no longer required 18

COBOL Unicode support and Java interoperability Java is based on Unicode COBOL:Java interoperability support heavily uses Unicode implicitly, under the covers COBOL programmer can use Unicode at application level, to communicate String data to/from Java 19

COBOL Unicode support and Java interoperability Example: Invoke Java, passing String object Class String is 'java.lang.string'. Class Vendor is 'com.acme.vendor'. 01 Greece pic N(6) usage national value N'Ελλάδα'. 01 CountryString usage object reference String. 01 avendor usage object reference Vendor. Call 'NewString' using by value JNIEnvPtr address of Greece length of Greece returning CountryString Invoke avendor 'setlocation' using by value CountryString 20

COBOL Unicode support and XML processing COBOL now contains built-in syntax for processing XML documents XML PARSE statement parses XML documents, drives processing procedure for each event XML documents may be encoded in UTF-16 Unicode XML documents encoded in UTF-8 may be converted to UTF-16 using the NATIONAL-OF function, then parsed XML-NTEXT special register returns to the program the Unicode content from the document, that is associated with each event 21

COBOL Unicode support and XML processing - example 01 XMLdocument pic N(10000) usage national. XML PARSE XMLdocument Processing procedure XMLproc End-XML. XMLproc. Evaluate XML-Event When 'START-OF-ELEMENT' If XML-NText = N'Ελλάδα' Display 'Processing <Greece> element' End-if End-evaluate. 22

System configuration for Unicode!Unicode support in COBOL and DB2 is based on the package: ƒ z/os Support for Unicode, or ƒ OS/390 Support for Unicode! This support must be installed and configured, on both development and production systems that use: ƒ COBOL Unicode support, ƒ COBOL Object-Oriented language syntax for Java interoperability, or ƒ DB2 Unicode support 23

Installing "Support for Unicode"! z/os V1R2 or later ƒ Support for Unicode is part of the operating system ƒ Documentation: z/os Support for Unicode: Using Conversion Services (SA22-7649) http://publibz.boulder.ibm.com/epubs/pdf/iea2un20.pdf! z/os V1R1 or OS/390 V2R10 ƒ Install OS/390 Support for Unicode from the web: https://www6.software.ibm.com/dl/os390/unicodespt-p ƒ Documentation: OS/390 Support for Unicode: Using Conversion Services (SC33-7050) http://publibfp.boulder.ibm.com/pubs/pdfs/os390/cunuge00.pdf See Enterprise COBOL Customization Guide 24

References! Unicode Unicode Consortium: http://www.unicode.org IBM DeveloperWorks - Unicode: http://www.ibm.com/developerworks/unicode/ ICU: http://oss.software.ibm.com/icu/! z/os Support for Unicode: Using Conversion Services http://publibz.boulder.ibm.com/epubs/pdf/iea2un20.pdf! DB2 for OS/390 and z/os V7: Installation Guide http://www.ibm.com/software/data/db2/os390/library.html! COBOL for z/os &OS/390 V3R2 books (LRM, PG, CG) http://www.ibm.com/software/awdtools/cobol/zos/library 25