Representação de Caracteres IFBA Instituto Federal de Educ. Ciencia e Tec Bahia Curso de Analise e Desenvolvimento de Sistemas Introdução à Ciência da Computação Prof. Msc. Antonio Carlos Souza Coletânea York University - ITEC 1011
Introdução Exemplos Real World Data Input device Computer Data Dear Mom: Keyboard 10110010 Digital camera 10110010
Formatos Apropriados A representação interna deve ser apropriada para o tipo de processamento (texto, imagem e som)
Tipos de Dados Números Inteiro ou ponto fixo Ponto Flutuante Número Decimal (BCD) Caracteres ACSII (American Standard Code for Information Interchange) EBCDIC (Extended binary Coded Decimal Interchange Code) Dados Lógicos Endereços
Convenções Formatos Apropriados Unique to a product or company E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes Padrões Evolve two ways: Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time) Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)
Organizações Padrões ISO International Standards Organization CSA Canadian Standards Association ANSI American National Standards Institute IEEE Institute for Electrical and Electronics Engineers Etc.
Exemplos de Padrões Type of Data Standards Alphanumeric Image Motion picture Sound Outline graphics/fonts ASCII, EBCDIC, Unicode JPEG, GIF, PCX, TIFF MPEG-2, Quick Time Sound Blaster, WAV, AU PostScript, TrueType, PDF
Por que Padrões? Padrões são arbitrary Eles existem porque são: Convenient Efficient Flexible Appropriate Etc.
Representação de Caracteres Em geral, usa-se códigos alfanuméricos Código de 6 bits Código de 7 bits (ASCII) EBCDIC ASCII estendido ISO Latin - 1 Caracteres ANSI Caracteres Unicode
Dados Alfanuméricos Problema: Distinguir entre o número 123 (one hundred and twenty-three) and the characters 123 (one, two, three) Quatro padrões para representar letras (alpha) and números BCD Binary-coded decimal ASCII American standard code for information interchange EBCDIC Extended binary-coded decimal interchange code Unicode
Código de 6 bits Permite representar de 2 6 64 caracteres 26 letras maiúsculas 10 algarismos ( 0 1 2 3 4 5 6 7 8 9 ) 28 caracteres especiais, incluindo Space
7 bits (ASCII)
Binary-Coded Decimal (BCD) 4 bits por dígito Note: the following bit patterns are not used: 1010 1011 1100 1101 1110 1111 Digit 0 1 2 3 4 5 6 7 8 9 Bit pattern 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
Example 7093 10? (in BCD) 7 0 9 3 0111 0000 1001 0011
Standard Alphanumeric Formats BCD ASCII EBCDIC Unicode Next 22 slides
The Problem Representing text strings, such as Hello, world, in a computer
Codes and Characters Each character is coded as a byte Most common coding system is ASCII (Pronounced ass-key) ASCII American National Standard Code for Information Interchange Defined in ANSI document X3.4-1977
ASCII Features 7-bit code 8 th bit is unused (or used for a parity bit) 2 7 128 codes Two general types of codes: 95 are Graphic codes (displayable on a console) 33 are Control codes (control features of the console or communications channel)
ASCII Chart 000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 Most $ significant 4 bit D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 Least significant VT ESC bit + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
e.g., a 1100001 000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
95 Graphic codes 000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
33 Control codes 000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
Alphabetic codes 000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
Numeric codes 000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
Punctuation, etc. 000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
Hello, world Example Binary 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01100111 01110010 01101100 01100100 Hexadecimal 48 65 6C 6C 6F 2C 20 77 67 72 6C 64 Decimal 72 101 108 108 111 44 32 119 103 114 108 100 H el l o, w or l d
Common Control Codes CR 0D carriage return LF 0A line feed HT 09 horizontal tab DEL 7F delete NULL 00 null Hexadecimal code
000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
000 001 010 011 100 101 110 111 0000 NULL DLE 0 @ P ` p 0001 SOH DC1! 1 A Q a q 0010 STX DC2 " 2 B R b r 0011 ETX DC3 # 3 C S c s 0100 EDT DC4 $ 4 D T d t 0101 ENQ NAK % 5 E U e u 0110 ACK SYN & 6 F V f v 0111 BEL ETB ' 7 G W g w 1000 BS CAN ( 8 H X h x 1001 HT EM ) 9 I Y i y 1010 LF SUB * : J Z j z 1011 VT ESC + ; K [ k { 1100 FF FS, < L \ l 1101 CR GS - M ] m } 1110 SO RS. > N ^ n ~ 1111 SI US /? O _ o DEL
Standard Alphanumeric Formats BCD ASCII EBCDIC Unicode Next 1 slides
EBCDIC Extended BCD Interchange Code (pronounced ebb -se-dick) 8-bit code Developed by IBM Rarely used today IBM mainframes only
8 bits (EBCDIC) Extended Binary Coded Decimal Interchange Code
8 bits (ASCII Estendido)
ISO Latin-1
Caracteres ANSI Windows 9x suporta caracteres ANSI American National Standards Institute Representação de 8 bits (256 caracteres) 0 a 255 Valores de 0 a 127: mesmos de ASCII Entre 128 a 255: similar a ISO Latin-1 Tem extensões e incompatibilidades
Standard Alphanumeric Formats BCD ASCII EBCDIC Unicode Next 2 slides
Unicode 16-bit standard Developed by a consortia Intended to supercede older 7- and 8-bit codes
Unicode Version 2.1 1998 Improves on version 2.0 Includes the Euro sign (20AC 16 ) From the standard: contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica. http://www.unicode.org
Caracteres Unicode Windows NT usa Unicode 16-bits Cobre grande parte das línguas vivas Também linguas mortas (uso escolar) Detalhes http://www.unicode.org
Keyboard Input Key ( scan ) codes are converted to ASCII ASCII code sent to host computer Received by the host as a stream of data Stored in buffer Processed Etc.
Outras Entradas OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices