Perl & Multiple-byte Characters
|
|
|
- Shanon Cook
- 9 years ago
- Views:
Transcription
1 Perl & Multiple-byte Characters Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.ora.com/pub/examples/nutshell/ujip/perl/perl97.pdf
2 Multiple-byte Issues Why Worry? Affects all CJKV (Chinese, Japanese, Korean, and Vietnamese) encodings up to four bytes per character! EUC-CN, ISO-2022-CN, ISO-2022-CN-EXT, HZ, and GBK for Simplified Chinese (China) EUC-TW, ISO-2022-CN, ISO-2022-CN-EXT, and Big Five for Traditional Chinese (Taiwan) EUC-JP, ISO-2022-JP, and Shift-JIS for Japanese EUC-KR, ISO-2022-KR, Johab, and UHC for Korean EUC-VN and ISO-2022-VN for Vietnamese not yet defined Required in the context of Unicode Unicode encoding is 16-bit fixed-length (aka UTF-16) UTF-8 encoding is variable-length one, two, or three bytes Remember: One byte does not always equal one character
3 Multiple-byte Concerns Code conversion For converting data into common or different encoding Required for cross-platform development Data manipulation Simple text processing, such as search/replace required for product localization Related to code conversion, but not a global operation Searching Simple matching also required for product localization Extensive use of regular expressions The basis for performing multiple-byte tricks through proven and related techniques such as anchoring and trapping
4 Code Conversion Techniques Algorithmic Mathematical, and applied equally to every character Table-driven Requires a lookup table Round-trip may be an issue with undefined code points Selective Convert only certain characters or certain character classes Can be algorithmic or table-driven usually table-driven Example: Half- to full-width katakana table-driven ftp://ftp.ora.com/pub/examples/nutshell/ujip/perl/unkana.pl HIJ (three characters) KL (two characters) Combination of above techniques
5 Algorithmic Techniques Pure algorithm Mathematical transformations are applied equally to every character Example: Unicode (UTF-16) UTF-8 UTF-7 Example: EUC-JP ISO-2022-JP Shift-JIS Normalization identical sequence, incompatible encoding Character codes are normalized to become a continuous sequence beginning at zero Example: Johab hangul Unicode hangul OP (0x8BB1 0xC3A1) OP (0xAE40 0xCE58)
6 Table-driven Techniques Used for conversion between incompatible encodings Example: Unicode other CJK encodings Example: UHC hangul Unicode hangul Example: Big Five EUC-TW (CNS ) Example: Half- to full-width katakana Hash Simple hash-based lookup using the original (unmodified) character codes Zero-based table Character codes are normalized to become a continuous sequence beginning at zero Consider EUC-JP code set 1 (JIS X 0208:1997)
7 #!/usr/local/bin/perl -w # Converting EUC-JP code set 1 to zero-based values $ch = "\xb7\xf5"; # The kanji A of JIS X 0208:1997 # Subtract 0xA1 (161) from the first byte then multiple by 94 # Subtract 0xA1 (161) from the second byte # Add the two values to obtain zero-based value $zeroch = ((ord(substr($ch,0,1)) - 0xA1) * 94) + (ord(substr($ch,1,1)) - 0xA1); print "Zero-based value of $ch is $zeroch\n"; # $zeroch equals the 2,153rd character # The following converts the zero-based value back to the original # by reversing the effects of zero-based conversion $ch = chr(($zeroch / 94) + 0xA1). chr(($zeroch % 94) + 0xA1); print "$ch\n"; # $ch again equals 0xB7F5 (A)
8 Selective Code Conversion Act as filters applied to specific characters or character classes Example: Simplified to traditional Chinese characters M N Example: Half- to full-width katakana Usually table-driven, but can be algorithmic
9 Combination Code Conversion EUC-JP ISO-2022-JP Shift-JIS conversion may also involve half- to full-width katakana conversion as a method of filtering JConv (ANSI C) forces half- to full-width katakana conversion when converting EUC-JP or Shift-JIS to ISO-2022-JP The following pages provide a complete set of working Japanese code converters JIS X 0208:1997, ASCII/JIS-Roman, and half-width katakana support Emphasis on readability rather than efficiency there is more than one way to do it
10 #!/usr/local/bin/perl -w # ISO-2022-JP to EUC-JP while (defined($line = <STDIN>)) { $line =~ s{ # JIS X 0208:1997 \e\$[\@b] # ESC $ or B ((?:[\x21-\x7e][\x21-\x7e])+) # Two-byte characters \e\([bhj] # ESC ( plus B, H, or J }{($x = $1) =~ tr/\x21-\x7e/\xa1-\xfe/, # From 7- to 8-bit $x }egx; $line =~ s{ # JIS X half-width katakana \e\(i # ESC ( I ([\x21-\x7e]+) # Half-width katakana \e\([bhj] # ESC ( plus B, H, or J }{($x = $1) =~ tr/\x21-\x7e/\xa1-\xfe/, # From 7- to 8-bit ($y = $x) =~ s/([\xa1-\xfe])/\x8e$1/g, # Prefix with SS2 $y }egx; print STDOUT $line; }
11 #!/usr/local/bin/perl -w # EUC-JP to ISO-2022-JP while (defined($line = <STDIN>)) { $line =~ s{ # JIS X 0208:1997 ((?:[\xa1-\xfe][\xa1-\xfe])+) }{\e\$b$1\e\(j}gx; $line =~ s{ # JIS X half-width katakana ((?:\x8e[\xa0-\xdf])+) # Half-width katakana }{\e\(i$1\e\(j}gx; $line =~ s/\x8e//g; # Remove SS2s $line =~ tr/\xa1-\xfe/\x21-\x7e/; # From 8- to 7-bit print STDOUT $line; }
12 # Some functions for Shift-JIS conversions sub convert2sjis { # For EUC-JP and ISO-2022-JP to Shift-JIS = unpack("c*", $_[0]); = (); while (($hi, $lo) = splice(@euc, 0, 2)) { $hi &= 0x7f; $lo &= 0x7f; push(@out, (($hi + 1) >> 1) + ($hi < 95? 112 : 176), $lo + (($hi & 1)? ($lo > 95? 32 : 31) : 126)); } return } sub sjis2jis { # For Shift-JIS to ISO-2022-JP and EUC-JP = unpack("c*", $_[0]); for ($i = 0; $i $i += 2) { $ord[$i] = (($ord[$i]-($ord[$i]<160?112:176))<<1)- ($ord[$i+1]<159?1:0); $ord[$i+1] -= ($ord[$i+1]<159?($ord[$i+1]>127?32:31):126); } return }
13 #!/usr/local/bin/perl -w # ISO-2022-JP or EUC-JP to Shift-JIS while (defined($line = <STDIN>)) { $line =~ s{( # EUC-JP (?:[\xa1-\xfe][\xa1-\xfe])+ # JIS X 0208:1997 (?:\x8e[\xa0-\xdf])+ # Half-width katakana )}{substr($1,0,1) eq "\x8e"? (($x = $1) =~ s/\x8e//g, $x) : &convert2sjis($1)}egx; $line =~ s{ # Handle ISO-2022-JP \e\$[\@b] ((?:[\x21-\x7e][\x21-\x7e])+) \e\([bhj] }{&convert2sjis($1)}egx; $line =~ s{ # Handle ISO-2022-JP half-width katakana \e\(i ([\x20-\x5f]+) \e\([bhj] }{($x = $1) =~ tr/\x20-\x5f/\xa0-\xdf/, $x}egx; print STDOUT $line; }
14 #!/usr/local/bin/perl -w # Shift-JIS to ISO-2022-JP while (defined($line = <STDIN>)) { $line =~ s{( # JIS X 0208:1997 and half-width katakana (?:[\x81-\x9f\xe0-\xef][\x40-\x7e\x80-\xfc])+ [\xa0-\xdf]+ )}{ ($x=$1)!~ /^[\xa0-\xdf]/? "\e\$b". &sjis2jis($1). "\e\(j" : "\e\(i". (($y=$x) =~ tr/\xa0-\xdf/\x20-\x5f/, $y). "\e\(j" }egx; print STDOUT $line; }
15 #!/usr/local/bin/perl -w # Shift-JIS to EUC-JP while (defined($line = <STDIN>)) { $line =~ s{( # JIS X 0208:1997 and half-width katakana (?:[\x81-\x9f\xe0-\xef][\x40-\x7e\x80-\xfc])+ [\xa0-\xdf]+ )}{ ($x = $1)!~ /^[\xa0-\xdf]/? (($y = &sjis2jis($x)) =~ tr/\21-\x7e/\xa1-\xfe/, $y) : (($y = $x) =~ s/([\xa0-\xdf])/\x8e$1/g, $y) }egx; print STDOUT $line; }
16 Code Conversion Pitfalls No round-trip mapping for most table-driven conversions Consider Shift-JIS/EUC-JP to Unicode (UTF-16) conversion Of the 8,836 code points available in Shift-JIS and EUC-JP code set 1 (JIS X 0208:1997), only 6,879 are assigned and have mappings to Unicode the rest are converted into the same Unicode code point Some encodings have regions that are not compatible with other related encodings Example: The Shift-JIS user-defined range (0xF040 0xFCFC) is not encoded in EUC-JP Example: EUC-JP code set 3 (JIS X ) is not encoded in Shift-JIS
17 Regex Techniques Multiple-byte anchoring Necessary for successful and correct matching of multiplebyte characters Trapping all characters Otherwise, matching may occur across character boundaries Necessary for selective conversion or data manipulation You must specify the complete encoding range in order to successfully trap or anchor multiple-byte text Convenient to store the complete encoding specification in a variable for trapping and anchoring, such as $encoding Don t forget to apply the ox regex modifiers when using the free-formatted encoding specifications that follow!
18 $encoding = qq<[\x00-\xff][\x00-\xff]>; # UCS-2 $encoding = qq< # EUC-JP [\x00-\x8d\x90-\xa0\xff] # Code set 0 & one-byte \x8e[\xa0-\xdf] # Code set 2 \x8f[\xa1-\xfe][\xa1-\xfe] # Code set 3 [\xa1-\xfe][\xa1-\xfe] # Code set 1 >; $encoding = qq< # EUC-TW [\x00-\x8d\x8f-\xa0\xff] # Code set 0 & one-byte \x8e[\xa1-\xb0][\xa1-\xfe][\xa1-\xfe] # Code set 2 [\xa1-\xfe][\xa1-\xfe] # Code set 1 >; $encoding = qq< # EUC-KR and EUC-CN [\x00-\xa0\xff] # Code set 0 & one-byte [\xa1-\xfe][\xa1-\xfe] # Code set 1 >; $encoding = qq< # GBK [\x00-\x80\xff] # One-byte [\x81-\xfe][\40-\x7e\x80-\xfe] # GBK >;
19 #!/usr/local/bin/perl -w # Multiple-byte anchoring when matching Shift-JIS encoded text $search = "\x8c\x95"; # A $text1 = "Text 1 \x90\x56\x8c\x95\x93\xb9"; # BAC $text2 = "Text 2 \x94\x92\x8c\x8c\x95\x61"; # DEF $encoding = qq< # Shift-JIS encoding [\x00-\x80\xfd-\xff] # ASCII and other one-byte [\xa0-\xdf] # Half-width katakana [\x81-\x9f\xe0-\xfc][\x40-\x7e\x80-\xfc] # Two-byte range >; print "First attempt -- no anchoring\n"; print " Matched Text1\n" if $text1 =~ /$search/o; print " Matched Text2\n" if $text2 =~ /$search/o; print "Second attempt -- anchoring\n"; print " Matched Text1\n" if $text1 =~ /^(?:$encoding)*?$search/ox; print " Matched Text2\n" if $text2 =~ /^(?:$encoding)*?$search/ox;
20 Regex Techniques (Cont d) Create an array that has elements with varying numbers of bytes implements trapping Most useful for variable-length encodings Process text by iterating over the resulting array using operators such as foreach Using length() then tells you how long each character is
21 #!/usr/local/bin/perl -w # This program shows how to break up multiple-byte text into # separate array elements; it prints every character, one per # line; two-byte characters as hexadecimal with "0x" prefix $encoding = qq< # Shift-JIS encoding [\x00-\x80\xfd-\xff] # ASCII and other one-byte [\xa0-\xdf] # Half-width katakana [\x81-\x9f\xe0-\xfc][\x40-\x7e\x80-\xfc] # Two-byte range >; while (defined($line = <STDIN>)) = $line =~ /($encoding)/gox; # One character per element foreach $element (@enc) { if (length($element) == 2) { # If two-byte character print STDOUT "0x". ($x = uc unpack("h*",$element), $x); } else { # All others are one-byte characters print STDOUT "$element\n"; } } }
22 Regular Expression Pitfalls \W ( not a word ) versus \w ( a word ) in cross-platform environments you may get more than you bargained for The standard definition of \w: [0-9A-Z_a-z] (or [\x30-\x39\x41-\x5a\x5f\x61-\x7a]) But MacPerl s definition of \w adds the following: [\x80-\x9f\xae\xaf\xbe\xbf\xcb-\xcf\xd8\xd9\xe5-\xef\xf1-\xf5] When encoding ranges are needed, use explicit ones Specify the entire encoding range, including unused code points All code points from 0x00 through 0xFF must either represent themselves (that is, be one-byte characters), or else must be a valid first byte of a multiple-byte character
23 Other Useful Techniques Multiple-byte data is often obscured (damaged?) by Base64, URL, or quoted-printable encoding HTML forms Use the MIME module for Base64 decoding To decode a URL-encoded string: $string =~ s/%([0-9a-fa-f][0-9a-fa-f])/pack("c",hex($1))/ge; To decode a quoted-printable string: $string =~ s/=([0-9a-fa-f][0-9a-fa-f])/pack("c",hex($1))/ge; $string =~ s/=[\x0a\x0d]+$//;
24 Advantages of Unicode A single encoding consider the possible encodings for the Chinese character G 0x4E2D ( center or middle ) Simplified Chinese = 0x5650 or 0xD6D0 Traditional Chinese = 0x4463, 0xC4E3, 0x8EA1C4E3, or 0xA4A4 Japanese = 0x4366, 0xC3E6, or 0x9286 Korean = 0x7169, 0xF1E9, or 0xF3E9 Vietnamese = 0x4A36 or 0xCAB6 All Unicode characters with the exception of the surrogates are 16-bit (two bytes) All characters are given equal treatment No more need to deal with multiple-byte data Consider Java s Unicode-based model
25 Advantages of Unicode (Cont d) Java Version 1.1 s code conversion model Table-driven conversion for CJKV characters Non-Unicode encodings treated as raw data (also called byte sequences or byte arrays ) that must be imported Unicode data are considered type char Shift-JIS EUC-JP becomes Shift-JIS Unicode EUC-JP Strings objects can be explicitly converted to/from Unicode Input and output streams can be converted on-the-fly to/ from Unicode Table-driven code conversion can lose data but only undefined code points, mind you
26 Other Useful Information jcode.pl library by Kazumasa Utashiro ftp://ftp.iij.ad.jp/pub/iij/dist/utashiro/perl/ Unicode module by Gisle Aas Supports UTF-16 (Unicode) UTF-8 algorithmic conversion Useful multiple-byte capable Perl programs ftp://ftp.ora.com/pub/examples/nutshell/ujip/perl/ JPerl by Hirofumi Watanabe Japanese version of Perl with Japanese-enhanced regular expressions and other Japanese-specific enhancements Supports EUC-JP and Shift-JIS encodings Beware of portability issues
27
Data Integrator. Encoding Reference. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA
Data Integrator Encoding Reference Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Telephone: 888.296.5969 or 512.231.6000 Fax: 512.231.6010 Email: [email protected]
Chapter 4: Computer Codes
Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data
Binary Representation
Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must tbe able to handle more than just values for real world problems
Introduction to Unicode. By: Atif Gulzar Center for Research in Urdu Language Processing
Introduction to Unicode By: Atif Gulzar Center for Research in Urdu Language Processing Introduction to Unicode Unicode Why Unicode? What is Unicode? Unicode Architecture Why Unicode? Pre-Unicode Standards
Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:
Numeral Systems Which number is larger? 25 8 We need to distinguish between numbers and the symbols that represent them, called numerals. The number 25 is larger than 8, but the numeral 8 above is larger
San José, February 16, 2001
San José, February 16, 2001 Feel free to distribute this text (version 1.4) including the author s e-mail address (mailto:[email protected]) and to contact him for corrections and additions. Please do not
Python Programming: An Introduction to Computer Science
Python Programming: An Introduction to Computer Science Sequences: Strings and Lists Python Programming, 2/e 1 Objectives To understand the string data type and how strings are represented in the computer.
Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.
Binary Representation The basis of all digital data is binary representation. Binary - means two 1, 0 True, False Hot, Cold On, Off We must be able to handle more than just values for real world problems
Kazuraki : Under The Hood
Kazuraki : Under The Hood Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated Why Develop Kazuraki? To build excitement and awareness about OpenType Japanese fonts Kazuraki is the first
Frequently Asked Questions on character sets and languages in MT and MX free format fields
Frequently Asked Questions on character sets and languages in MT and MX free format fields Version Final 17 January 2008 Preface The Frequently Asked Questions (FAQs) on character sets and languages that
ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters
The char Type ASCII Encoding The C char type stores small integers. It is usually 8 bits. char variables guaranteed to be able to hold integers 0.. +127. char variables mostly used to store characters
The following functions are provided by the Digest::MD5 module. None of these functions are exported by default.
NAME SYNOPSIS Digest::MD5 - Perl interface to the MD5 Algorithm # Functional style use Digest::MD5 qw(md5 md5_hex md5_base64); $digest = md5($data); $digest = md5_hex($data); $digest = md5_base64($data);
Base Conversion written by Cathy Saxton
Base Conversion written by Cathy Saxton 1. Base 10 In base 10, the digits, from right to left, specify the 1 s, 10 s, 100 s, 1000 s, etc. These are powers of 10 (10 x ): 10 0 = 1, 10 1 = 10, 10 2 = 100,
The Unicode Standard Version 8.0 Core Specification
The Unicode Standard Version 8.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
plc numbers - 13.1 Encoded values; BCD and ASCII Error detection; parity, gray code and checksums
plc numbers - 3. Topics: Number bases; binary, octal, decimal, hexadecimal Binary calculations; s compliments, addition, subtraction and Boolean operations Encoded values; BCD and ASCII Error detection;
Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC www.casabasecurity.com
Unicode Security Software Vulnerability Testing Guide (DRAFT DOCUMENT this document is currently a preview in DRAFT form. Please contact me with corrections or feedback.) Software Globalization provides
Project: Simulated Encrypted File System (SEFS)
Project: Simulated Encrypted File System (SEFS) Omar Chowdhury Fall 2015 CS526: Information Security 1 Motivation Traditionally files are stored in the disk in plaintext. If the disk gets stolen by a perpetrator,
Number Representation
Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data
EE282 Computer Architecture and Organization Midterm Exam February 13, 2001. (Total Time = 120 minutes, Total Points = 100)
EE282 Computer Architecture and Organization Midterm Exam February 13, 2001 (Total Time = 120 minutes, Total Points = 100) Name: (please print) Wolfe - Solution In recognition of and in the spirit of the
Java Interview Questions and Answers
1. What is the most important feature of Java? Java is a platform independent language. 2. What do you mean by platform independence? Platform independence means that we can write and compile the java
Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July 2013. The JSON Data Interchange Format. Reference number ECMA-123:2009
Ecma/TC39/2013/NN 4 th Draft ECMA-XXX 1 st Edition / July 2013 The JSON Data Interchange Format Reference number ECMA-123:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2013
Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007
Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007 The Java Type System By now, you have seen a fair amount of Java. Time to study in more depth the foundations of the language,
Section 1.4 Place Value Systems of Numeration in Other Bases
Section.4 Place Value Systems of Numeration in Other Bases Other Bases The Hindu-Arabic system that is used in most of the world today is a positional value system with a base of ten. The simplest reason
SMPP protocol analysis using Wireshark (SMS)
SMPP protocol analysis using Wireshark (SMS) Document Purpose Help analyzing SMPP traffic using Wireshark. Give hints about common caveats and oddities of the SMPP protocol and its implementations. Most
Cyber Security Workshop Encryption Reference Manual
Cyber Security Workshop Encryption Reference Manual May 2015 Basic Concepts in Encoding and Encryption Binary Encoding Examples Encryption Cipher Examples 1 P a g e Encoding Concepts Binary Encoding Basics
Bluetooth HID Profile
RN-WIFLYCR-UM-.01 RN-HID-UM Bluetooth HID Profile 2012 Roving Networks. All rights reserved. Version 1.0r 1/17/2012 USER MANUAL www.rovingnetworks.com 1 OVERVIEW Roving Networks Bluetooth modules support
The programming language C. sws1 1
The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan
Unraveling Unicode: A Bag of Tricks for Bug Hunting
Unraveling Unicode: A Bag of Tricks for Bug Hunting Black Hat USA July 2009 Chris Weber www.lookout.net [email protected] Casaba Security Can you tell the difference? How about now? The Transformers
EE 261 Introduction to Logic Circuits. Module #2 Number Systems
EE 261 Introduction to Logic Circuits Module #2 Number Systems Topics A. Number System Formation B. Base Conversions C. Binary Arithmetic D. Signed Numbers E. Signed Arithmetic F. Binary Codes Textbook
Perl version 5.22.0 documentation - Digest::SHA NAME SYNOPSIS SYNOPSIS (HMAC-SHA) http://perldoc.perl.org Page 1
NAME Digest::SHA - Perl extension for SHA-1/224/256/384/512 SYNOPSIS In programs: # Functional interface use Digest::SHA qw(sha1 sha1_hex sha1_base64...); $digest = sha1($data); $digest = sha1_hex($data);
Email Content Control. Admin Guide
Email Content Control Admin Guide Document Revision Date: May 7, 2013 Email Content Control Admin Guide i Contents Introduction... 1 About Content Control... 1 Configuration Overview for Content Control...
OPENID AUTHENTICATION SECURITY
OPENID AUTHENTICATION SECURITY Erik Lagercrantz and Patrik Sternudd Uppsala, May 17 2009 1 ABSTRACT This documents gives an introduction to OpenID, which is a system for centralised online authentication.
Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.
Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to
Counting in base 10, 2 and 16
Counting in base 10, 2 and 16 1. Binary Numbers A super-important fact: (Nearly all) Computers store all information in the form of binary numbers. Numbers, characters, images, music files --- all of these
I PUC - Computer Science. Practical s Syllabus. Contents
I PUC - Computer Science Practical s Syllabus Contents Topics 1 Overview Of a Computer 1.1 Introduction 1.2 Functional Components of a computer (Working of each unit) 1.3 Evolution Of Computers 1.4 Generations
Regular Expression Syntax
1 of 5 12/22/2014 9:55 AM EmEditor Home - EmEditor Help - How to - Search Regular Expression Syntax EmEditor regular expression syntax is based on Perl regular expression syntax. Literals All characters
Finding XSS in Real World
Finding XSS in Real World by Alexander Korznikov [email protected] 1 April 2015 Hi there, in this tutorial, I will try to explain how to find XSS in real world, using some interesting techniques. All
Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Digital Logic II May, I before starting the today s lecture
Soft-Starter SSW-06 V1.6X
Motors Energy Automation Coatings Soft-Starter SSW-06 V1.6X Serial Communication Manual Language: English Document: 0899.5731 / 04 Serial Communication Manual Series: SSW-06 V1.6X Language: English Document
Pemrograman Dasar. Basic Elements Of Java
Pemrograman Dasar Basic Elements Of Java Compiling and Running a Java Application 2 Portable Java Application 3 Java Platform Platform: hardware or software environment in which a program runs. Oracle
How to represent characters?
Copyright Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information. How to represent characters?
Red Hat Enterprise Linux International Language Support Guide
Red Hat Enterprise Linux International Language Support Guide Red Hat Enterprise Linux International Language Support Guide Copyright This book is about international language support for Red Hat Enterprise
HKSCS-2004 Support for Windows Platform
HKSCS-2004 Support for Windows Platform Windows XP Font Pack for ISO 10646:2003 + Amendment 1 Traditional Chinese Support (HKSCS-2004) update for Windows XP and Windows Server 2003 June 2010 Version 1.0
DNA Data and Program Representation. Alexandre David 1.2.05 [email protected]
DNA Data and Program Representation Alexandre David 1.2.05 [email protected] Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2-valued
Systems I: Computer Organization and Architecture
Systems I: Computer Organization and Architecture Lecture 2: Number Systems and Arithmetic Number Systems - Base The number system that we use is base : 734 = + 7 + 3 + 4 = x + 7x + 3x + 4x = x 3 + 7x
An overview of FAT12
An overview of FAT12 The File Allocation Table (FAT) is a table stored on a hard disk or floppy disk that indicates the status and location of all data clusters that are on the disk. The File Allocation
Section 4.1 Rules of Exponents
Section 4.1 Rules of Exponents THE MEANING OF THE EXPONENT The exponent is an abbreviation for repeated multiplication. The repeated number is called a factor. x n means n factors of x. The exponent tells
Chapter 7D The Java Virtual Machine
This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly
Chapter 5 Names, Bindings, Type Checking, and Scopes
Chapter 5 Names, Bindings, Type Checking, and Scopes Chapter 5 Topics Introduction Names Variables The Concept of Binding Type Checking Strong Typing Scope Scope and Lifetime Referencing Environments Named
INTERNATIONAL STANDARD
INTERNATIONAL STANDARD ISO/IEC 18004 First edition 2000-06-15 Information technology Automatic identification and data capture techniques Bar code symbology QR Code Technologies de l'information Techniques
[MS-OXTNEF]: Transport Neutral Encapsulation Format (TNEF) Data Algorithm
[MS-OXTNEF]: Transport Neutral Encapsulation Format (TNEF) Data Algorithm Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications
Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC
Introduction to Programming (in C++) Loops Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC Example Assume the following specification: Input: read a number N > 0 Output:
Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:
Example Introduction to Programming (in C++) Loops Assume the following specification: Input: read a number N > 0 Output: write the sequence 1 2 3 N (one number per line) Jordi Cortadella, Ricard Gavaldà,
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
Product Internationalization of a Document Management System
Case Study Product Internationalization of a ì THE CUSTOMER A US-based provider of proprietary Legal s and Archiving solutions, with a customizable document management framework. The customer s DMS was
How To Write Portable Programs In C
Writing Portable Programs COS 217 1 Goals of Today s Class Writing portable programs in C Sources of heterogeneity Data types, evaluation order, byte order, char set, Reading period and final exam Important
CS177 MIDTERM 2 PRACTICE EXAM SOLUTION. Name: Student ID:
CS177 MIDTERM 2 PRACTICE EXAM SOLUTION Name: Student ID: This practice exam is due the day of the midterm 2 exam. The solutions will be posted the day before the exam but we encourage you to look at the
Appendix C: Keyboard Scan Codes
Thi d t t d ith F M k 4 0 2 Appendix C: Keyboard Scan Codes Table 90: PC Keyboard Scan Codes (in hex) Key Down Up Key Down Up Key Down Up Key Down Up Esc 1 81 [ { 1A 9A, < 33 B3 center 4C CC 1! 2 82 ]
6 3 4 9 = 6 10 + 3 10 + 4 10 + 9 10
Lesson The Binary Number System. Why Binary? The number system that you are familiar with, that you use every day, is the decimal number system, also commonly referred to as the base- system. When you
Chapter 3. if 2 a i then location: = i. Page 40
Chapter 3 1. Describe an algorithm that takes a list of n integers a 1,a 2,,a n and finds the number of integers each greater than five in the list. Ans: procedure greaterthanfive(a 1,,a n : integers)
ELEG3924 Microprocessor Ch.7 Programming In C
Department of Electrical Engineering University of Arkansas ELEG3924 Microprocessor Ch.7 Programming In C Dr. Jingxian Wu [email protected] OUTLINE 2 Data types and time delay I/O programming and Logic operations
Section IV.1: Recursive Algorithms and Recursion Trees
Section IV.1: Recursive Algorithms and Recursion Trees Definition IV.1.1: A recursive algorithm is an algorithm that solves a problem by (1) reducing it to an instance of the same problem with smaller
Storing Measurement Data
Storing Measurement Data File I/O records or reads data in a file. A typical file I/O operation involves the following process. 1. Create or open a file. Indicate where an existing file resides or where
Japanese Character Printers EPL2 Programming Manual Addendum
Japanese Character Printers EPL2 Programming Manual Addendum This addendum contains information unique to Zebra Technologies Japanese character bar code printers. The Japanese configuration printers support
Goals. Unary Numbers. Decimal Numbers. 3,148 is. 1000 s 100 s 10 s 1 s. Number Bases 1/12/2009. COMP370 Intro to Computer Architecture 1
Number Bases //9 Goals Numbers Understand binary and hexadecimal numbers Be able to convert between number bases Understand binary fractions COMP37 Introduction to Computer Architecture Unary Numbers Decimal
7 Why Use Perl for CGI?
7 Why Use Perl for CGI? Perl is the de facto standard for CGI programming for a number of reasons, but perhaps the most important are: Socket Support: Perl makes it easy to create programs that interface
Technical Specifications for KD5HIO Software
Technical Specifications for KD5HIO Software Version 0.2 12/12/2000 by Glen Hansen, KD5HIO HamScope Forward Error Correction Algorithms HamScope is a terminal program designed to support multi-mode digital
Internationalization of Domain Names
Internationalization of Domain Names Marc Blanchet ([email protected]) Co-chair of the IETF idn working group Viagénie (http://www.viagenie.qc.ca) Do You Like Quoted Printable? If yes, then
The Answer to the 14 Most Frequently Asked Modbus Questions
Modbus Frequently Asked Questions WP-34-REV0-0609-1/7 The Answer to the 14 Most Frequently Asked Modbus Questions Exactly what is Modbus? Modbus is an open serial communications protocol widely used in
Table 1 below is a complete list of MPTH commands with descriptions. Table 1 : MPTH Commands. Command Name Code Setting Value Description
MPTH: Commands Table 1 below is a complete list of MPTH commands with descriptions. Note: Commands are three bytes long, Command Start Byte (default is 128), Command Code, Setting value. Table 1 : MPTH
Grandstream Networks, Inc.
Grandstream Networks, Inc. Universal Phonebook Editor User Guide Universal Phonebook Editor User Guide Universal Phonebook Editor User Guide Index INTRODUCTION... 3 OVERVIEW OF FUNCTIONS AND UI... 4 BASIC
Python Loops and String Manipulation
WEEK TWO Python Loops and String Manipulation Last week, we showed you some basic Python programming and gave you some intriguing problems to solve. But it is hard to do anything really exciting until
UNDERSTANDING SMS: Practitioner s Basics
UNDERSTANDING SMS: Practitioner s Basics Michael Harrington, CFCE, EnCE It is estimated that in 2006, 72% of all mobile phone users world wide were active users of SMS or text messaging. In European countries
XML. CIS-3152, Spring 2013 Peter C. Chapin
XML CIS-3152, Spring 2013 Peter C. Chapin Markup Languages Plain text documents with special commands PRO Plays well with version control and other program development tools. Easy to manipulate with scripts
RS-485 Protocol Manual
RS-485 Protocol Manual Revision: 1.0 January 11, 2000 RS-485 Protocol Guidelines and Description Page i Table of Contents 1.0 COMMUNICATIONS BUS OVERVIEW... 1 2.0 DESIGN GUIDELINES... 1 2.1 Hardware Design
EURESCOM - P923 (Babelweb) PIR.3.1
Multilingual text processing difficulties Malek Boualem, Jérôme Vinesse CNET, 1. Introduction Users of more and more applications now require multilingual text processing tools, including word processors,
SMALL INDEX LARGE INDEX (SILT)
Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song
Automating SQL Injection Exploits
Automating SQL Injection Exploits Mike Shema IT Underground, Berlin 2006 Overview SQL injection vulnerabilities are pretty easy to detect. The true impact of a vulnerability is measured
Secret Communication through Web Pages Using Special Space Codes in HTML Files
International Journal of Applied Science and Engineering 2008. 6, 2: 141-149 Secret Communication through Web Pages Using Special Space Codes in HTML Files I-Shi Lee a, c and Wen-Hsiang Tsai a, b, * a
Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
Today s topics. Digital Computers. More on binary. Binary Digits (Bits)
Today s topics! Binary Numbers! Brookshear.-.! Slides from Prof. Marti Hearst of UC Berkeley SIMS! Upcoming! Networks Interactive Introduction to Graph Theory http://www.utm.edu/cgi-bin/caldwell/tutor/departments/math/graph/intro
Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration
Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration This JavaScript lab (the last of the series) focuses on indexing, arrays, and iteration, but it also provides another context for practicing with
[MS-OXMSG]: Outlook Item (.msg) File Format. Intellectual Property Rights Notice for Open Specifications Documentation
[MS-OXMSG]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,
SQL Injection Optimization and Obfuscation Techniques
SQL Injection Optimization and Obfuscation Techniques By Roberto Salgado Introduction SQL Injections are without question one of the most dangerous web vulnerabilities around. With all of our information
Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C
Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C 1 An essential part of any embedded system design Programming 2 Programming in Assembly or HLL Processor and memory-sensitive
Preservation Handbook
Preservation Handbook Plain text Author Version 2 Date 17.08.05 Change History Martin Wynne and Stuart Yeates Written by MW 2004. Revised by SY May 2005. Revised by MW August 2005. Page 1 of 7 File: presplaintext_d2.doc
The use of binary codes to represent characters
The use of binary codes to represent characters Teacher s Notes Lesson Plan x Length 60 mins Specification Link 2.1.4/hi Character Learning objective (a) Explain the use of binary codes to represent characters
Chapter 2. Binary Values and Number Systems
Chapter 2 Binary Values and Number Systems Numbers Natural numbers, a.k.a. positive integers Zero and any number obtained by repeatedly adding one to it. Examples: 100, 0, 45645, 32 Negative numbers A
TECHNICAL UNIVERSITY OF CRETE DATA STRUCTURES FILE STRUCTURES
TECHNICAL UNIVERSITY OF CRETE DEPT OF ELECTRONIC AND COMPUTER ENGINEERING DATA STRUCTURES AND FILE STRUCTURES Euripides G.M. Petrakis http://www.intelligence.tuc.gr/~petrakis Chania, 2007 E.G.M. Petrakis
public static void main(string[] args) { System.out.println("hello, world"); } }
Java in 21 minutes hello world basic data types classes & objects program structure constructors garbage collection I/O exceptions Strings Hello world import java.io.*; public class hello { public static
Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl
First CGI Script and Perl Perl in a nutshell Prof. Rasley shebang line tells the operating system where the Perl interpreter is located necessary on UNIX comment line ignored by the Perl interpreter End
CTNET Field Protocol Specification November 19, 1997 DRAFT
CTNET Field Protocol Specification November 19, 1997 DRAFT Introduction Version 1.0 of CTNET will support the AB3418 protocol for communication to field controllers. AB3418 is a point-topoint protocol
Xbox 360 File Specifications Reference
Xbox 360 File Specifications Reference Introduction This reference attempts to document the specifications of the custom data formats in use by the Xbox 360 console. This data has either been discovered
Critical Values for I18n Testing. Tex Texin Chief Globalization Architect XenCraft
Critical Values for I18n Testing Tex Texin Chief Globalization Architect XenCraft Abstract In this session, we recommend specific data values that are likely to identify internationalization problems in
Compiler Construction
Compiler Construction Regular expressions Scanning Görel Hedin Reviderad 2013 01 23.a 2013 Compiler Construction 2013 F02-1 Compiler overview source code lexical analysis tokens intermediate code generation
Binary Trees and Huffman Encoding Binary Search Trees
Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary
