As previously noted, a byte can contain a numeric value in the range 0-255. Computers don't understand Latin, Cyrillic, Hindi, Arabic character sets!



Similar documents
File Handling. What is a file?

The C Programming Language course syllabus associate level

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

System Calls and Standard I/O

Informatica e Sistemi in Tempo Reale

The programming language C. sws1 1

C / C++ and Unix Programming. Materials adapted from Dan Hood and Dianna Xu

Chapter 4: Computer Codes

So far we have considered only numeric processing, i.e. processing of numeric data represented

Simple C Programs. Goals for this Lecture. Help you learn about:

2 ASCII TABLE (DOS) 3 ASCII TABLE (Window)

5 Arrays and Pointers

2011, The McGraw-Hill Companies, Inc. Chapter 3

Sources: On the Web: Slides will be available on:

Fundamentals of Programming

Illustration 1: Diagram of program function and data flow

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Numbering Systems. InThisAppendix...

Module 816. File Management in C. M. Campbell 1993 Deakin University

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Member Functions of the istream Class

Number Representation

Lecture 16: System-Level I/O

Pemrograman Dasar. Basic Elements Of Java

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

Base Conversion written by Cathy Saxton

Systems I: Computer Organization and Architecture

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

This is great when speed is important and relatively few words are necessary, but Max would be a terrible language for writing a text editor.

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

Binary Representation

/* File: blkcopy.c. size_t n

Characters & Strings Lesson 1 Outline

Stacks. Linear data structures

The string of digits in the binary number system represents the quantity

C Programming. for Embedded Microcontrollers. Warwick A. Smith. Postbus 11. Elektor International Media BV. 6114ZG Susteren The Netherlands

System Calls Related to File Manipulation

Simple Image File Formats

Section 1.4 Place Value Systems of Numeration in Other Bases

Useful Number Systems

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

Tutorial on C Language Programming

PHP Debugging. Draft: March 19, Christopher Vickery

Embedded Systems Design Course Applying the mbed microcontroller

Variables, Constants, and Data Types

Giving credit where credit is due

=

The use of binary codes to represent characters

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Lecture 25 Systems Programming Process Control

Unix the Bare Minimum

arrays C Programming Language - Arrays

Keywords are identifiers having predefined meanings in C programming language. The list of keywords used in standard C are : unsigned void

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

CSI 333 Lecture 1 Number Systems

Computer Science 281 Binary and Hexadecimal Review

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

The Answer to the 14 Most Frequently Asked Modbus Questions

Japanese Character Printers EPL2 Programming Manual Addendum

The Hexadecimal Number System and Memory Addressing

Operating Systems. Privileged Instructions

Memory management. Announcements. Safe user input. Function pointers. Uses of function pointers. Function pointer example

C++ INTERVIEW QUESTIONS

The Linux Operating System and Linux-Related Issues

Secrets of printf. 1 Background. 2 Simple Printing. Professor Don Colton. Brigham Young University Hawaii. 2.1 Naturally Special Characters

Programming languages C

1. Boolean Data Type & Expressions Outline. Basic Data Types. Numeric int float Non-numeric char

Basics of I/O Streams and File I/O

Chapter Binary, Octal, Decimal, and Hexadecimal Calculations

Project 2: Bejeweled

COMPSCI 210. Binary Fractions. Agenda & Reading

Streaming Lossless Data Compression Algorithm (SLDC)

Lecture 11: Number Systems

Boolean Data Outline

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Chapter One Introduction to Programming

Lab 1: Introduction to C, ASCII ART and the Linux Command Line Environment

Version 1.5 Satlantic Inc.

1 Description of The Simpletron

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

UNIX, Shell Scripting and Perl Introduction

This section describes how LabVIEW stores data in memory for controls, indicators, wires, and other objects.

Chapter 1: Digital Systems and Binary Numbers

Advanced Bash Scripting. Joshua Malone

Decimal to Binary Conversion

3/13/2012. Writing Simple C Programs. ESc101: Decision making using if-else and switch statements. Writing Simple C Programs

How To Write Portable Programs In C

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

DOS Command Reference

Phys4051: C Lecture 2 & 3. Comment Statements. C Data Types. Functions (Review) Comment Statements Variables & Operators Branching Instructions

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Binary Numbers. Binary Octal Hexadecimal

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Lab 2 - CMPS 1043, Computer Science I Introduction to File Input/Output (I/O) Projects and Solutions (C++)

Transcription:

Encoding of alphanumeric and special characters As previously noted, a byte can contain a numeric value in the range 0-255. Computers don't understand Latin, Cyrillic, Hindi, Arabic character sets! Alphanumeric and special characters of the Latin alphabet are stored in memory as integer values encoded using the ASCII code. (American standard code for the interchange of information). Other codes requiring 16 bits per character have been developed to support languages having large number of written symbols. A simple program for displaying the ASCII encoding scheme The correspondence between decimal, hexadecimal, and character representations can be readily generated via the following simple program. int main( int argc, char *argv[]) int c; c = ' '; /* Same as c = 32 or c = 0x20 */ while (c <= 'z') printf("%3d %02x %c \n", c, c, c); c = c + 1; 10

Output of the ASCII table generator class/210/examples ==> gcc -o p2 p2.c class/210/examples ==> p2 more 32 20 33 21! 34 22 " 35 23 # 36 24 $ 37 25 % 38 26 & 39 27 ' 40 28 ( 41 29 ) 42 2a * 43 2b + 44 2c, 45 2d - 46 2e. 47 2f / 48 30 0 49 31 1 : 57 39 9 58 3a : 59 3b ; 60 3c < 61 3d = 62 3e > 63 3f? 64 40 @ 65 41 A 66 42 B : 89 59 Y 90 5a Z 91 5b [ 92 5c \ 93 5d ] 94 5e ^ 95 5f _ 96 60 ` 97 61 a 98 62 b : 120 78 x 121 79 y 122 7a z There are also a few special characters that follow z. 11

Control characters: The ASCII encodings between decimal 0 and 31 are used for to encode what are commonly called control characters. Control characters having decimal values in the range 1 through 26 can be entered from the keyboard by holding down the ctrl key while typing a letter in the set a through z. Some control characters have escaped code respresentations in C, but all may be written in octal. Dec Keystroke Name Escaped code 4 8 ctrl-d ctrl-h end of file backspace '\004' '\b' 9 ctrl-i tab '\t' 10 ctrl-j newline '\n' 12 ctrl-l page eject '\f' 13 ctrl-m carriage return '\r' 12

Formatted Output and Input Printing integer values: As previously noted integer values are stored in computer memory using a binary representation. To communicate the value of an integer to a human, it is necessary to produce the string of ASCII characters that correspond to the rendition of the integer in the Latin alphabet. For example the consider the byte 1001 0100 This is the binary encoding of the hex number 0x94 which is the decimal number 9 * 16 + 4 = 148. Therefore, if this number is to be rendered as a decimal number on a printer or display, three bytes corresponding to the ASCII encodings of 1, 4, and 8 must be sent to the printer or display. These bytes are expressed in hexadecimal as: 31 34 38 In the C language, run time libraries provide functions that interface with the Operating System to provide this service. Some of these functions may actually be implemented as macros that call the actual RTL functions. The printf()function (actually a macro that converts to fprintf(stdout, ) ) is used to produce the ASCII encoding of an integer and send it to an output device or file. Format codes specify how you want to see the integer represented. %c Consider the integer to be the ASCII encoding of a character and render that character %d Produce the ASCII encoding of the integer expressed as a decimal number %x Produce the ASCII encoding of the integer expressed as a hexadecimal number %o Produce the ASCII encoding of the integer expressed as an octal number 13

Specifying field width These may be preceded by an optional field width specifier. The code %02x shown below forces the field to be padded with leading 0's if necessary to generate the specified field width. int main( int argc, char *argv[]) int x; int y = 78; x = 'A' + 65 + 0101 + 0x41 + '\n'; printf("x = %d \n", x); printf("y = %c %3d %02x %4o \n", y, y, y, y); /home/westall ==> gcc -o p1 p1.c /home/westall ==> p1 X = 270 Y = N 78 4e 116 The number of values printed by printf() is determined by the number of distinct format codes. 14

Output redirection The printf() function sends its output to a logical file commonly known as the standard outputor simply stdout. When a program is run in the Unix environment, the logical file stdout is by default associated with the screen being viewed by the person who started the program. The > operator may be used on the command line to cause the standard output to be redirected to a file: class/210/examples ==> p1 > p1.output A file created in this way may be subsequently viewed using the cat command (or edited using a text editor). class/210/examples ==> cat p1.output X = 270 Y = N 78 4e 116 15

Input of integer data When a human enters numeric data using the keyboard, the values passed to a program are the ASCII encodings of the keystrokes that were made. For example, when I type: 123.45 6 bytes are produced by the keyboard. The hexadecimal encoding of these bytes is: 31 32 33 2E 34 35 - hex 1 2 3. 4 5 - ascii To perform arithmetic using my number, it must be converted to interal floating point representation. The scanf() function is used to 1 - consume the ASCII encoding of a numeric value 2 - convert the ASCII string to the proper internal representation 3 - store the result in a memory location provided by the caller. As with printf(), a format code controls the process. The format code specifies both the input encoding and the desired type of value to be produced: %d string of decimal characters int %x string of hex characters unsigned int %o string of octal characters unsigned int %c ascii encoded character char %f floating point number in decimal float %lf floating point number in decimal double %e floating pt in scientific notation float Also as with printf() the number of values that scanf() will attempt to read is determined by the number of format codes provided. For each format code provided, it is mandatory that a variable be provided to hold the data that is read. 16

Specifying the variables to receive the values: It is extremely important to note that: the valueto be printed ins passed to printf() but the addressof the variable to receive the value must be passed to scanf() The & operator in C is the address of operator. Formatted input of integer values /* p3.c */ int main( int argc, char *argv[]) int a; int r; int b; r = scanf( %d %d, &a, &b); printf( Got %d items with values %d %d \n, r, a, b); 17

Care and feeding of input format specifiers Embedding of extra spaces and including '\n' in scanf() format strings can also lead to wierd behavior and should be avoided. Specification of field widths is dangerous unless you really know what you are doing. Effects of extra spaces in format specifier /* p3.c */ int main( int argc, char *argv[]) int a; int r; int b; r = scanf( %d %d, &a, &b); printf( Got %d items with values %d %d \n, r, a, b); If you enter only the two values that scanf() is expecting to find followed by enter nothing will happen. You will have to type control-d (which is the standard Unix end of file indicator). To get scanf() to return. class/210/examples ==> gcc -o p3 p3.c class/210/examples ==> p3 4 49 If you enter three values, it will not be necessary to enter control-d, but the third value will be ignored. class/210/examples ==> p3 1 5 6 Got 2 items with values 1 5 This bad behavior can be fixed by changing the function call to: r = scanf( %d %d, &a, &b); 18

Effect of invalid input If you accidentally enter a non-numeric value scanf() will abort and return you only 1 value. The value 4927 represents the uninitialized value of b. class/210/examples ==> p3 1 t 6 Got 1 items with values 1 4927 Pitfalls of field widths: It is legal to specify field widths to scanf() but usually dangerous to do so! class/210/examples ==> cat p4.c int main( int argc, char *argv[]) int a; int r; int b; r = scanf(" %2d %2d ", &a, &b); printf("got %d items with values %d %d \n", r, a, b); class/210/examples ==> p4 123 456 Got 2 items with values 12 3 19

Detecting end-of-file with scanf() scanf() returns the number of values it actually obtained (which may be less than the number of values requested). Therefore the proper way to test for end-of-file is to ensure that the number of values obtained was the nmber of values requested. /* p4b.c */ int main( int argc, char *argv[]) int a; int r; int b; while ((r = scanf("%d %d", &a, &b)) == 2) printf("got %d items with values %d %d \n", r, a, b); Input redirection Like the stdout the stdin may also be redirected. To redirect both stdin and stdout use: a.out < input.txt > output.txt when invoked in this manner when the program a.out reads from the stdin via scanf() or fscanf(stdin,.) the input will actually be read from a file named input.txt and any data written to the stdout will end up in the file output.txt. 20

The necessity of format conversion As stated previously, the %d format causes scanf() to convert the ASCII text representation of a number to an internal binary integer value. One might think it would be possible to avoid this and be able to deal with mixed alphabetic and numeric data by simply switching to the %c format. However, if we need to do arithmetic on what we read in, this approach does not work. /* p11c.c */ int main(void) char i, j, k; scanf("%c %c %c", &i, &j, &k); printf("the first input was %c \n", i); printf("the third input was %c \n", k); printf("their product is %c \n", i * k); class/210/examples ==> p11c 2 C 4 The first input was 2 The third input was 4 Their product is ( Why do we get ('' and not 8 for the answer? Since no format conversion is done the value stored in i is the ASCII code for the numeral 2 which is 0x32 = 50 Similarly the value stored in j is 0x34 = 52. When they are multiplied, the result is 50 x 52 = 2600 which is too large to be held in 8 bits. The 8 bit product is (2600 mod 256) (the remainder when 2600 is divided by 256). That value is 2600-2560 = 40 = 0x28 which according to the ASCII table is the encoding of ('' 21

Floating point input and output The %e and %f format codes are used to print floating point numbers in scientific and decimal format. The l modifier should be used for doubles. Field size specificiation is of the form: field-width.number-of-digits-to-right-of-decimal int main( int argc, char *argv[]) float a; double b; float c; double d; a = 1024.123; b = 1.024123e+03; scanf("%f %le", &c, &d); printf("a = %10.2f b = %8.4lf c = %f d = %8le \n", a, b, c, d); class/210/examples ==> gcc -o p5 p5.c class/210/examples ==> p5 12.4356 1.4e22 a = 1024.12 b = 1024.1230 c = 12.435600 d = 1.400000e+22 22

General Input and Output The C language itself defines no facility for I/O operations. I/O support is provided through two collections of mutually incomptible function libraries Low level I/O open(), close(), read(), write(), lseek(), ioctl() Standard library I/O fopen(), fclose() fprintf(), fscanf() conversion fgetc(), fputc() fgets(), fputs() fread(), fwrite(), fseek() - opening and closing files - field at a time with data - character (byte) at a time - line at a time - physical block at a time Our focus will be on the use of the standard I/O library: Function and constant definitions are obtained via #includefacility. To use the standard library functions: #include <errno.h> 23

Standard library I/O The functions operate on ADT's of type FILE * (which is defined in stdio.h). Three special FILE *'s are automatically opened when any process (program) starts: stdin Normally keyboard input (but may be redirected with < ) stdout Normally terminal output (but may be directed with > or 1> ) 1 stderr Normally terminal output (but may be directed with 2> ) Since these files are predeclared and preopend they must not be declared nor opened in your program! The following example shows how to open, read, and write disk files by name. In our assignments we will almost always read from stdin and write to stdout or stderr. 1 The 1> and 2> notation is supported by the sh family of shells incuding bash,but is not supported in csh, tcsh. 24

int main() FILE *f1; FILE *f2; int x; f1 = fopen( in.txt, r ); if (f1 == 0) perror( f1 failure ); exit(1); f2 = fopen( out.txt, w ); if (f2 == 0) perror( f2 failure ); exit(2); if (fscanf(f1, %d, &x)!= 1) perror( scanf failure ); exit(2); fprintf(f2, %d, x); fclose(f1); fclose(f2); 25

Examples of the use of p6 In this example, we have not yet created in.txt class/210/examples ==> gcc -o p6 p6.c class/210/examples ==> p6 f1 failure:: No such file or directory cat: in.txt: No such file or directory Here, in.txt is created with the cat command: class/210/examples ==> cat > in.txt 99 Now if we rerun p6 and cat out.txt we obtain the correct answer class/210/examples ==> p6 class/210/examples ==> cat out.txt 99 Note: The scanf() and printf() functions are actually macros or abbreviations for: fscanf(stdin,...) and (stdout,...) 26

Character at a time input and output The fgetc() function can be used to read an I/O stream one byte at a time. The fputc() function can be used to write an I/O stream one byte at a time. Here is an example of how to build a cat command using the two functions. The p10 program is being used here to copy its own source code from standard input to output. class/210/examples ==> p10 < p10.c /* p10.c */ main() int c; while ((c = fgetc(stdin)) > 0) fputc(c, stdout); While fputc() and fgetc() are fine for building interactive applications, they are very inefficient and should never be used for reading or writing a large volume of data such as a photographic image file. 27

Line at a time input and character string output. The fgets(buffer_address, buf_size, file) function can be used to read from a stream until either: 1 - a newline character '\n' = 10 = 0x0a is encountered or 2 - the specified number of characters - 1 has been read or 3 - end of file is reached. There is no string data type in C, but a standard convention is that a string is an array of characters in which the end is of the string is marked by the presence of a byte which has the value binary 00000000 (sometimes called the NULL character). fgets() will append the NULL character to whatever it reads in. Since fgets() will read in multiple characters it is not possible to assign what it reads to a single variable of type char. Thus a pointer to a buffermust be passed (as was the case with scanf()). The fputs() function writes a NULL terminated string to a stream (after stripping off the NULL). The following example is yet another cat program, but this one works one line at a time. class/210/examples ==> p11 < p11.c 2> countem /* p11.c */ #include <string.h> main() unsigned char *buff = NULL; int line = 0; buff = malloc(1024); // alloc storage to hold data if (buff == 0) exit(1); 28

while (fgets(buff, 1024, stdin)!= 0) fputs(buff, stdout); fprintf(stderr, "%d %d \n", line, strlen(buff)); line += 1; free(buff); // release the storage malloc allocated Here buff is declared a pointer and the storage which will hold the data is allocated with malloc() Alternatively we could have declared unsigned char buff[1024] and not used malloc() Here is the output that went to the standard error. Note that each line that appeared empty has length 1 (the newline character itself) and the lines that appear to have length 1 actually have length 2. class/210/examples ==> cat countem 0 12 1 1 2 19 3 20 4 1 5 7 6 2 7 24 8 18 9 1 10 24 11 18 12 15 13 1 14 41 15 5 16 27 17 55 18 17 19 4 20 2 29