Unit 5: Recursion Part 3 1 Application of Recursion to Languages Engineering 4892: Data Structures 1 Application of Recursion to Linked Lists Faculty of Engineering & Applied Science Memorial University of Newfoundland 1 Recursion within a C++ class June 14, 2011 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 1 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 2 / 27 Application of Recursion to Languages Grammar A formal language is defined as a set of strings (sequences of symbols) formed from a finite alphabet. An alphabet is a basic set of symbols. For example, the following are languages: {a, b, ab, ba Alphabet: a, b {00, 01, 10, 11 Alphabet: 0, 1 While all strings belonging to a formal language are finite, the number of strings belonging to the language may be infinite. The following are examples of infinite languages: The set of all syntactically correct C++ programs Engish (or French, Mandarin,...) A grammar provides a precise way to specify a (possibly infinite) language. A grammar is composed of the following elements: Symbols from the alphabet. e.g., a, b Special symbols called nonterminals. Nonterminals stand in for other symbols and/or nonterminals. Nonterminals are denoted by a word surrounded by angle brackets. e.g. <noun phrase>, <identifier>. Productions, which are rules that show what other symbols and/or nonterminals a nonterminal can stand in for. A production is a function from nonterminals to nonterminals and/or symbols. Production rules have the following syntax: <A> = x y <A> can be replaced by the string x y. <B> = x y <B> can be replaced by either x or y. ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 3 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 4 / 27
Examples e.g. The following is the grammar for a legal C++ identifier: <identifier> = <letter> <identifier> <letter> <identifier> <digit> <letter> = a b... z A B... Z _ <digit> = 0 1... 9 Consider how we might evaluate whether lr2 is a legal identifier: Legal if lr is legal and 2 is a digit (invoking first production) lr is legal if l is legal and r is a letter All conditions satisfied, lr2 is legal (grammar repeated) <identifier> = <letter> <identifier> <letter> <identifier> <digit> <letter> = a b... z A B... Z _ <digit> = 0 1... 9 C++ code to recognize whether a string is a valid identifier or not: bool isidentifier ( string str ) { if ( str. length ( ) == 0) return false ; if ( str. length ( ) == 1) return isalpha ( str [ 0 ] ) ; // True i f char i s a l e t t e r. { char lastchar = str [ str. length () 1]; str. erase ( str. length () 1, 1 ) ; // Erase l a s t char. return isidentifier ( str ) && ( isdigit ( lastchar ) isalpha ( lastchar ) ) ; ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 5 / 27 e.g. Palindromes (words which read the same backwards and forwards e.g. radar, deed, redivider) can be described with a grammar: <pal> = empty string <ch> a <pal> a b <pal> b... Z <pal> Z <ch> = a b... z A B... Z We can write a recursive function to determine if a string is a palindrome. The following is the pseudocode for such a function: // Pre: String str consists only of the letters a-z and A-Z. ispal( string str ) if ( str.length == 0 str.length == 1 ) return true if ( str s first char == str s last char ) return ispal( str minus first and last char s ) return false Algebraic Expressions Note: the notes below only consider algebraic expressions which consist of the common binary operators (+ - * /). The following is an algebraic expression, which is valid in C++: x + z * (w/k + 3-4) When evaluating an algebraic expression in a programming language, a compiler must do two things: Determine if the expression is valid Evaluate the expression Achieving these objectives is easier in some notations than others. There are three common notations for algebraic expressions... ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 7 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 8 / 27
Infix Expressions In an infix expression, the binary operator appears between the two operands. For example, (a + b) Rules for precedence are required to control the order of operations. For example, in the expression: a + b * c the multiplication is evaluated first. In order to force the addition to happen first we must use parenthesis: (a + b) * c Prefix Expressions Prefix expressions put binary operators before the two operands they apply to. Precedence and parentheses are not required as the order of operations is given by the expression itself. For example the infix expression, a + b * c is written in prefix notation as, + a * b c The infix expression that forces the addition to go first is, (a + b) * c It is written in prefix as, * + a b c ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 9 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 10 / 27 Postfix Expressions Postfix expressions put the operators after the operands. For example the infix expression, a + b * c Is written in postfix as, a b c * + Postfix expressions can be evaluated with a stack: for (each symbol in expression) if (symbol is an operand) push onto stack { op = operation given by symbol operand2 = pop() operand1 = pop() result = operand1 op operand2 push result onto stack We will now focus on prefix expressions... The following is the grammar for prefix expressions: <prefix> = <identifier> <operator> <prefix> <prefix> <operator> = + - * / <identifier> = a b... z Thus, the following are prefix expressions: a +ab *+ab-cd *+ab-c+ef The following are not prefix expressions: * +a ++bc +aba +ab* +ab*cd
Recall the main production in our definition of a prefix expression: <prefix> = <identifier> <operator> <prefix> <prefix> The following applies this production to the evaluation of prefix expressions: evaluateprefix(string strexp) ch = strexp[0] Delete the first character from strexp if ( ch is an identifier ) return value of the identifier if ( ch is an operator named op ) { operand1 = evaluateprefix(strexp) operand2 = evaluateprefix(strexp) return operand1 op operand2 Note: A C++ implementation of evaluateprefix would have to pass strexp by reference. Why? ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 13 / 27 The general case is the right-hand side of the above definition. The general case is recognized by the initial <operator>. The first recursive call will evaluate the first embedded <prefix>. It will also strip it away, char by char. Thus, the second recursive call will be passed a string containing only the second embedded <prefix>. The following is a trace of a call to evaluateprefix for the prefix expression +/821 which is operating on single-digit numbers. Note that the indentation level corresponds to the depth of recursion: evaluateprefix(+/821) evaluateprefix(/821) evaluateprefix(821) returns 8 evaluateprefix(21) returns 2 returns 4 evaluateprefix(1) returns 1 returns 5 Consider now how to tell whether an expression is prefix. A direct translation of the production: <prefix> = <identifier> <operator> <prefix> <prefix>...is difficult because of the two adjacent prefix nonterminals. The base case is easy, but longer expressions are more challenging. Consider why the following examples are not prefix: * (doesn t satisfy base case) +a (missing second operand for +) ++bc (missing second operand for first +) +aba (valid prefix followed by an identifier) +ab* (valid prefix followed by an operator) +ab*cd (valid prefix followed by a valid prefix) A valid prefix expression (general case) starts with an operator, has two embedded prefix expressions, and nothing afterwards. The following function finds the end of a valid prefix expression. If the end of a valid prefix expression cannot be found -1 is returned. endpre( string strexp, int first ) if ( first >= strexpr.length ) return -1 ch = strexp[ first ] if ( ch is an identifier ) return first if ( ch is an operator ) firstend = endpre(strexp, first + 1) if ( firstend!= -1 ) return endpre(strexp, firstend + 1) return -1 return -1
Application of Recursion to Linked Lists We can use endpre within the function ispre to determine if a string is a prefix expression. ispre(string strexp) last = endpre(strexp,0) return (last >= 0 AND last == strexp.length - 1) Recursion is very useful for processing linked structures, such as linked lists and trees. The general idea is that we perform some operation at the current node, and then use a recursive call (or calls) to apply that same operation to remaining nodes. For these notes we are going to use C style linked lists. That is, we do not necesarily have an ADT or C++ class that represents the list as a whole. Instead, we may just have: struct Node { int info ; Node next ; ; To represent a linked list we just need a single pointer-to-node (usually called head). ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 17 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 18 / 27 Certain operations on the list would be defined: void insertathead ( Node &head, int newinfo ) ; int length ( Node head ) ;... To create the list (10, 20) we would execute: Node head = NULL ; insertathead ( head, 2 0 ) ; insertathead ( head, 1 0 ) ; Terminology: For a nonempty linked list with head pointer head. We call the first node the head node We call the list that has head >next as its head, the rest of the list (could be the empty list) Given this terminology, we can arrive at recursive definitions for some common linked list operations. Length of a List For a nonempty list, the length is 1 more than the length of the rest of the list. For an empty list, the length is just 0... ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 19 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 20 / 27
int length ( Node head ) { if ( head == 0) return 0 ; return 1 + length ( head >next ) ; Sum of a List For a nonempty list, the sum includes the data in the head node, plus the sum of the rest of the list. int sum ( Node head ) { if ( head == 0) return 0 ; return head >info + sum ( head >next ) ; Printing the List To print all the data fields of a nonempty list, first print the data of the head node, then print the rest of the list. void printfw ( Node head ) { if ( head!= 0) { cout << head >info << endl ; printfw ( head >next ) ; What simple change to printfw() would cause the list to be printed backwards? To print a nonempty list backwards, print the rest of the list backwards and then print the data of the head node. ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 21 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 22 / 27 1. Copy the head: Making a Copy of a List void copy ( Node src, Node &targ, bool &ok ) ; targ is a reference to the head of the new list which will be a copy of the list headed by src. A nonempty list can be copied, by first creating a copy of the head node and then copying the rest of the list. targ = new Node ; if ( targ == 0) ok = false ; { targ >info = src >info ;... 2. Copy the rest: copy ( src >next, targ >next, ok ) ; 3. Base case. If the source list is empty there is nothing to do but set the target list to be empty as well: if ( src == 0) { targ = 0 ; ok = true ;... ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 23 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 24 / 27
Putting all of the above together leads to the following code: void copy ( Node src, Node &targ, bool &ok ) { if ( src == 0) { targ = 0 ; ok = true ; { targ = new Node ; if ( targ == 0) ok = false ; { targ >info = src >info ; copy ( src >next, targ >next, ok ) ; Recursion within a C++ class Recursive functions often require different parameters than the public member functions of a C++ class. Thus, to incorporate recursion into your class you will often create recursive helper methods which do all the real work. The public methods then just call their recursive helpers: class SLL { public :... int length ( ) ; private : int reclength ( Node cur ) ; // R e c u r s i v e h e l p e r method Node head, tail ; ; ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 25 / 27 ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 26 / 27 Part of implementation file: int SLL : : length ( ) { reclength ( head ) ; int SLL : : reclength ( Node cur ) { if ( cur == 0) return 0 ; return 1 + reclength ( cur >next ) ; ENGI 4892 (MUN) Unit 5, Part 3 June 14, 2011 27 / 27