ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters



Similar documents
So far we have considered only numeric processing, i.e. processing of numeric data represented

Member Functions of the istream Class

Informatica e Sistemi in Tempo Reale

5 Arrays and Pointers

File Handling. What is a file?

Basics of I/O Streams and File I/O

Sources: On the Web: Slides will be available on:

Basic Java Constructs and Data Types Nuts and Bolts. Looking into Specific Differences and Enhancements in Java compared to C

As previously noted, a byte can contain a numeric value in the range Computers don't understand Latin, Cyrillic, Hindi, Arabic character sets!

J a v a Quiz (Unit 3, Test 0 Practice)

System Calls and Standard I/O

Variables, Constants, and Data Types

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Chapter 13 - The Preprocessor

Stacks. Linear data structures

C Examples! Jennifer Rexford!

Introduction to Programming II Winter, 2014 Assignment 2

Lecture 5: Java Fundamentals III

Passing 1D arrays to functions.

Module 816. File Management in C. M. Campbell 1993 Deakin University

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

C / C++ and Unix Programming. Materials adapted from Dan Hood and Dianna Xu

Advanced Bash Scripting. Joshua Malone

The C Programming Language course syllabus associate level

C Strings and Pointers

Fundamentals of Programming

CS106A, Stanford Handout #38. Strings and Chars

arrays C Programming Language - Arrays

Keywords are identifiers having predefined meanings in C programming language. The list of keywords used in standard C are : unsigned void

Outline. Conditional Statements. Logical Data in C. Logical Expressions. Relational Examples. Relational Operators

1 Abstract Data Types Information Hiding

How To Write Portable Programs In C

Phys4051: C Lecture 2 & 3. Comment Statements. C Data Types. Functions (Review) Comment Statements Variables & Operators Branching Instructions

Python Lists and Loops

Comp151. Definitions & Declarations

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Channel Access Client Programming. Andrew Johnson Computer Scientist, AES-SSG

Project 2: Bejeweled

Tutorial on C Language Programming

SPSS for Windows importing and exporting data

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Simple Image File Formats

Moving from CS 61A Scheme to CS 61B Java

Converting Microsoft Access 2002 to Pipe-Delimited ASCII Text Files

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

FTP client Selection and Programming

Memory management. Announcements. Safe user input. Function pointers. Uses of function pointers. Function pointer example

Number Representation

C Programming. for Embedded Microcontrollers. Warwick A. Smith. Postbus 11. Elektor International Media BV. 6114ZG Susteren The Netherlands

An Incomplete C++ Primer. University of Wyoming MA 5310

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Introduction to Data Structures

Programming Languages CIS 443

Operating Systems and Programming Languages

University of Hull Department of Computer Science. Wrestling with Python Week 01 Playing with Python

Crash Dive into Python

The programming language C. sws1 1

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Introduction to Java

PL / SQL Basics. Chapter 3

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

CS 103 Lab Linux and Virtual Machines

Time Clock Import Setup & Use

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

SAPScript. A Standard Text is a like our normal documents. In Standard Text, you can create standard documents like letters, articles etc

2 ASCII TABLE (DOS) 3 ASCII TABLE (Window)

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

I PUC - Computer Science. Practical s Syllabus. Contents

Computer Systems II. Unix system calls. fork( ) wait( ) exit( ) How To Create New Processes? Creating and Executing Processes

Programming Project 1: Lexical Analyzer (Scanner)

Encoding Text with a Small Alphabet

CSE 1223: Introduction to Computer Programming in Java Chapter 2 Java Fundamentals

Chapter 3. Input and output. 3.1 The System class

Introduction to Python

JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.

Visual Basic Programming. An Introduction

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

Object-Oriented Programming in Java

Lex et Yacc, exemples introductifs

This is great when speed is important and relatively few words are necessary, but Max would be a terrible language for writing a text editor.

Pemrograman Dasar. Basic Elements Of Java

An overview of FAT12

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Lab 1: Introduction to C, ASCII ART and the Linux Command Line Environment

The C++ Language. Loops. ! Recall that a loop is another of the four basic programming language structures

Computers. An Introduction to Programming with Python. Programming Languages. Programs and Programming. CCHSG Visit June Dr.-Ing.

CS 241 Data Organization Coding Standards

C++ Programming Language

Molecular Dynamics Simulations with Applications in Soft Matter Handout 7 Memory Diagram of a Struct

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

Chapter 2: Elements of Java

PharmaSUG Paper QT26

Install Java Development Kit (JDK) 1.8

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

1 Description of The Simpletron

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Python Programming: An Introduction to Computer Science

Transcription:

The char Type ASCII Encoding The C char type stores small integers. It is usually 8 bits. char variables guaranteed to be able to hold integers 0.. +127. char variables mostly used to store characters encoded in ASCII. Only store characters in char variables in COMP1911. Even if a numeric variable is only use for the values 0..9, use the type int for the variable. ASCII ( American Standard Code for Information Interchange) specifies encoding for 128 characters to the integers 0..127. The characters encoded include upper and lower case (English) letters, digits, common punctuation symbols (as you might find on the keyboard) plus a series of non-printable control characters (including newline and tab). Manipulating Characters Manipulating Characters Does this mean you will have to remember 100+ character codes!? Luckily no! We use character literals instead! For example: char a = a ; // ASCII code 97 char A = A ; // ASCII code 65 char space = ; // ASCII code 32 char p l u s = + ; // ASCII code 43 char n e w l i n e = \ n ; // ASCII code 10 Style Note Always use character literals in your code! Even if you are really proud of having memorised the ASCII Table! NOTE The ASCII codes for the digits (48 57), the upper case letters (65 90) and lower case letters (97 122) are in sequence. Knowing this allows us to do some neat things: i n t a = a ; i n t b = a + 1 ; // t h i s i s p o s s i b l e due to i n t c = a + 2 ; // the u n d e r l y i n g numeric type i n t B = b ( a A ) ;

Manipulating Characters Printing and Reading Characters We can also test various properties of characters: // check f o r l o w e r c a s e i f ( c >= a && c <= z ) { Problem: Convert a digit character to the integer it represents, e.g., 0 0, 7 7, etc. We use the fact that the digits codes are in order: // check i s a d i g i t i f ( c >= 0 && c <= 9 ) { v a l = c 0 ; // why does t h i s work? C provides library functions for reading and writing characters The getchar function This function reads and returns one input character. Note that, unlike scanf, it has no arguments; its return value is collected by assigning it to a variable. The putchar function This function takes a single int argument and prints it out Here is an example: i n t c ; p r i n t f ( P l e a s e e n t e r a c h a r a c t e r : ) ; c = g e t c h a r ( ) ; p u t c h a r ( c ) ; Reading Characters Reading Characters Consider the following code: i n t c1, c 2 ; p r i n t f ( P l e a s e e n t e r f i r s t c h a r a c t e r : \ n ) ; c1 = g e t c h a r ( ) ; p r i n t f ( P l e a s e e n t e r second c h a r a c t e r : \ n ) ; c2 = g e t c h a r ( ) ; p r i n t f ( F i r s t %c \ nsecond : %c \n, c1, c2 ) ; What is the output? Turns out that the newline input by pressing Enter after the first character is read by the second getchar. How can we fix the program? i n t c1, c2 ; p r i n t f ( P l e a s e e n t e r f i r s t c h a r a c t e r : \ n ) ; c1 = g e t c h a r ( ) ; g e t c h a r ( ) ; // r e a d s and d i s c a r d s a c h a r a c t e r p r i n t f ( P l e a s e e n t e r second c h a r a c t e r : \ n ) ; c2 = g e t c h a r ( ) ; p r i n t f ( F i r s t : %c \ nsecond : %c \n, c1, c2 ) ; Printing characters The conversion specifier for characters is %c. Using it we can supply variables of char type to printf for output:

End of Input An input funcion can such as scanf or getchar can fail because there is no input is available. This can occur, for example, if input is coming from a file and the end of the file is reached. On UNIX-like systems such Linux & OSX typing Ctrl + D on a terminal signals to the operating system there is no more input from the terminal. Windows has no equivalent, but some windows program interprert Ctrl + Z similarly. getchar returns a special non-ascii value to indicate there is no input was available. This non-ascii value is #defined as EOF in stdio.h. On most systems EOF == -1. Note -1, isn t an ASCII value (0..127) There is no end-of-file character on Linux or other modern operating systems. Reading Characters to End of Input Programming pattern for reading characters to the end of input: i n t ch ; ch = g e t c h a r ( ) ; w h i l e ( ch!= EOF) { p r i n t f ( you e n t e r e d the c h a r a c t e r : %c which ha ch = g e t c h a r ( ) ; For comparison the programming pattern for reading integers to end of input: i n t num ; // s c a n f r e t u r n s the number o f i t e m s read w h i l e ( s c a n f ( %d, &num) == 1) { p r i n t f ( you e n t e r e d the number : %d\n, num ) ; Strings A string is a sequence of characters. C uses a special ASCII character \0 to mark the end of strings. This is convenient because programs don t have to track the lnegth of the string. Definition A C string is a null-terminated character array. Consider the following: // t h i s i s i n c o r r e c t, \ 0 w i l l be d i s c a r d e d c h a r h e l l o [ 5 ] = { h, e, l, l, o, \ 0 ; // t h i s i s OK c h a r h e l l o [ 6 ] = { h, e, l, l, o, \ 0 ; // t h i s i s b e t t e r c h a r h e l l o [ ] = { h, e, l, l, o, \ 0 ; Useful C Library Functions for Characters The C library includes some useful functions which operate on characters. Several of the more useful listed below. Note the you need to #include <ctype.h> to use them. #i n c l u d e <c t y p e. h> i n t t o u p p e r ( i n t c ) ; // c o n v e r t c to upper c a s e i n t t o l o w e r ( i n t c ) ; // c o n v e r t c to l o w e r c a s e i n t i s a l p h a ( i n t c ) ; // t e s t i f c i s a l e t t e r i n t i s d i g i t ( i n t c ) ; // t e s t i f c i s a d i g i t i n t i s l o w e r ( i n t c ) ; // t e s t i f c i s a l o w e r c a s e l e t t e r i n t i s u p p e r ( i n t c ) ; // t e s t i f c i s a upper c a s e l e t t e r String Length The null character must be accounted for when sizing strings, although, somewhat confusingly, the library function strlen returns the length of a string not including the null character.

Strings Reading an Entire Input Line Because working with strings is so common, C provides some convenient syntax. Instead of writing: char h e l l o [ ] = { h, e, l, l, o, \ 0 ; You can write char h e l l o [ ] = h e l l o ; Note hello will have 6 elements The compiler automatically appends \0 when strings are initialised with string literals. Again, remember to allow space for it if sizing the array manually. The function fgets reads an entire line: #d e f i n e MAX LINE LENGTH 1024 char l i n e [ MAX LINE LENGTH ] ; f g e t s ( l i n e, MAX LINE LENGTH, s t d i n ) ; f p u t s ( l i n e, s t d o u t ) ; Reading an Entire Input Line Reading Lines to End of Input You might use fgets as follows: #d e f i n e MAX LINE LENGTH 1024 char l i n e [ MAX LINE LENGTH ] ; f g e t s ( l i n e, MAX LINE LENGTH, s t d i n ) ; f p u t s ( l i n e, s t d o u t ) ; // e q u i v a l e n t to p r i n t f ( % s, l i n e ) Programming pattern for reading lines to end of input: // f g e t s r e t u r n s NULL i f i t can t r e a d any c h a r a c t e r s w h i l e ( f g e t s ( l i n e, MAX LINE, s t d i n )!= NULL) { p r i n t f ( you e n t e r e d t h e l i n e : %s, l i n e ) ;

String Manipulation String Manipulation The header file string.h provides some useful string functions: // s t r i n g l e n g t h ( not i n c l u d i n g \ 0 ) s i z e t s t r l e n ( c o n s t c h a r s ) ; // s t r i n g copy c h a r s t r c p y ( c h a r dest, c o n s t c h a r s r c ) ; c h a r s t r n c p y ( c h a r dest, c o n s t c h a r s r c, s i z e t n ) ; // s t r i n g c o n c a t e n a t i o n / append c h a r s t r c a t ( c h a r dest, c o n s t c h a r s r c ) ; c h a r s t r n c a t ( c h a r dest, c o n s t c h a r s r c, s i z e t n ) ; // s t r i n g compare i n t strcmp ( c o n s t c h a r s1, c o n s t c h a r s2 ) ; i n t strncmp ( c o n s t c h a r s1, c o n s t c h a r s2, s i z e t n ) ; i n t s t r c a s e c m p ( c o n s t c h a r s1, c o n s t c h a r s2 ) ; i n t s t r n c a s e c m p ( c o n s t c h a r s1, c o n s t c h a r s2, s i z e t n ) ; // c h a r a c t e r s e a r c h c h a r s t r c h r ( c o n s t c h a r s, i n t c ) ; c h a r s t r r c h r ( c o n s t c h a r s, i n t c ) ; #i n c l u d e < s t r i n g. h> c h a r s t r 1 [ 1 0 0 ] = H e l l o World! ; c h a r s t r 2 [ 1 0 0 ] ; s t r n c p y ( s t r 2, s t r 1, 1 0 0 ) ; // copy s t r 1 to s t r 2 i f ( strcmp ( s t r 1, s t r 2 ) == 0) { // case s e n s i t i v e compare p r i n t f ( S t r i n g s match! \ n ) ; // append s t r 1 to s t r 2 s t r n c a t ( s t r 2, s t r 1, 100 ( s t r l e n ( s t r 2 ) + 1 ) ) ; i f ( s t r c a s e c m p ( s t r 1, s t r 2 ) ) { // case i n s e n s i t i v e compare p r i n t f ( S t r i n g s do not match! \ n ) ; p r i n t f ( %d\n, s t r l e n ( s t r 2 ) ) ; // s t r i n g l e n g t h Note that strlen does not count the null character! String Manipulation Strings Remember You can find out about what else is available in string.h by running man string. When working with strings we use the null character as a guard: c h a r s t r 1 [ 1 0 0 ] = H e l l o World! ; c h a r s t r 2 [ 1 0 0 ] ; i n t i ; // the f o l l o w i n g code c o p i e s s t r 1 i n t o s t r 2 i = 0 ; // s t a r t at i n d e x 0 w h i l e ( s t r 1 [ i ]!= \ 0 ) { // s t o p on n u l l s t r 2 [ i ] = s t r 1 [ i ] ; // copy i n d i v i d u a l c h a r a c t e r s i = i + 1 ; // i n c r e m e n t i n d e x s t r 2 [ i ] = \ 0 ; // MUST s e t t h i s f o r s t r 2! In summary strings: are null-terminated character arrays can be initialised with string literals can be manipulated by scanf/printf, use %s benefit from the string manipulation functions in string.h since they are arrays they cannot be copied via assignment (=) Careful The main error encountered when working with strings is mishandling the terminating null character, e.g., forgetting to set it! Check this first if your strings are behaving strangely.

Arrays of Strings Command-line Arguments Sometimes, instead of manipulating each individual cell, as for matrices, we need to manipulate whole arrays. This is generally the case when working with arrays of strings! Consider: c h a r names [ 3 ] [ 2 0 ] = { Mark, Luke, John ; // why does t h i s work? p r i n t f ( % s %s %s \n, names [ 0 ], names [ 1 ], names [ 2 ] ) ; Array of arrays If we take this view (array of arrays!) of 2D arrays, it makes sense why using only the first index gives us a whole array! What are command-line arguments? Arguments that are supplied to a program when it is run. We have seen them already, for example: % diff -i file1.txt file2.txt % gedit prog.c Here, -i, file1.txt and file2.txt are command-line arguments to diff and prog.c is a command-line argument to gedit. Command-line arguments are automatically supplied to our C programs, by the operating system, via the arguments of the main function (argc and argv)! i n t main ( i n t argc, char argv [ ] ) { Command-line Arguments Command-line Arguments Arguments to main argc stores the number of command-line arguments argv stores the command-line arguments as strings i n t main ( i n t argc, c h a r a r g v [ ] ) { i n t i ; f o r ( i = 0 ; i < a r g c ; i = i + 1) { // p r i n t arguments p r i n t f ( Argument %d i s : %s \n, i + 1, a r g v [ i ] ) ; NB The first argument is always the program name! This means that argc is always at least 1 and argv contains at least one value. By default, command-line arguments are space delimited. We can use quotes if arguments include spaces. %./prog1 nospace one space %./prog2 nospace "one space" In the above, prog1 sees three command-line arguments, while prog2 sees only two. What about? %./prog3 *.c %./prog4 10 < in > out Sometimes argv is typed as: char **argv.