Sequences in the C++ STL



Similar documents
Linked Lists: Implementation Sequences in the C++ STL

Node-Based Structures Linked Lists: Implementation

Heaps & Priority Queues in the C++ STL 2-3 Trees

Binary Heap Algorithms

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

Recursion vs. Iteration Eliminating Recursion

recursion, O(n), linked lists 6/14

Data Structures Using C++ 2E. Chapter 5 Linked Lists

Universidad Carlos III de Madrid

Abstract Data Type. EECS 281: Data Structures and Algorithms. The Foundation: Data Structures and Abstract Data Types

Analysis of a Search Algorithm

Why you shouldn't use set (and what you should use instead) Matt Austern

vector vec double # in # cl in ude <s ude tdexcept> tdexcept> // std::ou std t_of ::ou _range t_of class class V Vector { ector {

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Unordered Linked Lists

Cours de C++ Utilisations des conteneurs

Data Structures Using C++ 2E. Chapter 5 Linked Lists

10CS35: Data Structures Using C

Cpt S 223. School of EECS, WSU

Data Structures Fibonacci Heaps, Amortized Analysis

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

CmpSci 187: Programming with Data Structures Spring 2015

Linked Lists, Stacks, Queues, Deques. It s time for a chainge!

6. Standard Algorithms

Algorithms and Data Structures

Chapter 8: Bags and Sets

Course: Programming II - Abstract Data Types. The ADT Queue. (Bobby, Joe, Sue, Ellen) Add(Ellen) Delete( ) The ADT Queues Slide Number 1

ADTs,, Arrays, Linked Lists

Sequential Data Structures

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Lecture Notes on Linear Search

DATA STRUCTURES USING C

Linked List as an ADT (cont d.)

APP INVENTOR. Test Review

Software Engineering Concepts: Testing. Pointers & Dynamic Allocation. CS 311 Data Structures and Algorithms Lecture Slides Monday, September 14, 2009

PES Institute of Technology-BSC QUESTION BANK

What Is Recursion? Recursion. Binary search example postponed to end of lecture

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Common Data Structures

In mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write

C++ Programming Language

Data Structures and Algorithms

Introduction to Data Structures

Lecture Notes on Binary Search Trees

Persistent Binary Search Trees

Data Structure [Question Bank]

CHAPTER 4 ESSENTIAL DATA STRUCTRURES

Class Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction

Symbol Tables. Introduction

Loop Invariants and Binary Search

Lecture 11 Doubly Linked Lists & Array of Linked Lists. Doubly Linked Lists

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Course: Programming II - Abstract Data Types. The ADT Stack. A stack. The ADT Stack and Recursion Slide Number 1

Lecture 12 Doubly Linked Lists (with Recursion)

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

MAX = 5 Current = 0 'This will declare an array with 5 elements. Inserting a Value onto the Stack (Push)

Sources: On the Web: Slides will be available on:

ECE 250 Data Structures and Algorithms MIDTERM EXAMINATION /5:15-6:45 REC-200, EVI-350, RCH-106, HH-139

The ADT Binary Search Tree

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Module 2 Stacks and Queues: Abstract Data Types

Introduction to Programming System Design. CSCI 455x (4 Units)

CORBA Programming with TAOX11. The C++11 CORBA Implementation

7.1 Our Current Model

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

From Last Time: Remove (Delete) Operation

C++FA 3.1 OPTIMIZING C++

DATA STRUCTURE - STACK

Splicing Maps and Sets

PROBLEM SOLVING SEVENTH EDITION WALTER SAVITCH UNIVERSITY OF CALIFORNIA, SAN DIEGO CONTRIBUTOR KENRICK MOCK UNIVERSITY OF ALASKA, ANCHORAGE PEARSON

C++ INTERVIEW QUESTIONS

Data Structures. Topic #12

CSE373: Data Structures and Algorithms Lecture 1: Introduction; ADTs; Stacks/Queues. Linda Shapiro Spring 2016

Computer Science 210: Data Structures. Searching

Algorithms. Margaret M. Fleck. 18 October 2010

Converting a Number from Decimal to Binary

Queues Outline and Required Reading: Queues ( 4.2 except 4.2.4) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic

Ordered Lists and Binary Trees

Semantic Analysis: Types and Type Checking

Parameter Passing in Pascal

Data Structures. Level 6 C Module Descriptor

Pitfalls of Object Oriented Programming

Java Software Structures

CS107L Handout 04 Autumn 2007 October 19, 2007 Custom STL-Like Containers and Iterators

Chapter 13: Query Processing. Basic Steps in Query Processing

Data Structures and Data Manipulation

Design a Line Maze Solving Robot

Glossary of Object Oriented Terms

What is a Stack? Stacks and Queues. Stack Abstract Data Type. Java Interface for Stack ADT. Array-based Implementation

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Introduction to Programming (in C++) Sorting. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Memory management. Announcements. Safe user input. Function pointers. Uses of function pointers. Function pointer example

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

LINKED DATA STRUCTURES

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Zabin Visram Room CS115 CS126 Searching. Binary Search

Persistent Data Structures

Transcription:

CS 311 Data Structures and Algorithms Lecture Slides Wednesday, November 4, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005 2009 Glenn G. Chappell

Unit Overview Handling Data & Sequences Major Topics Data abstraction Introduction to Sequences Smart arrays Array interface Basic array implementation Exception safety Allocation & efficiency Generic containers Linked Lists Node-based structures More on Linked Lists Sequences in the C++ STL Stacks Queues 4 Nov 2009 CS 311 Fall 2009 2

Review Allocation & Efficiency An operation is amortized constant time if k operations require O(k) time. Thus, over many consecutive operations, the operation averages constant time. Not the same as constant-time average case (which averages over all possible inputs) Quintessential amortized-constant-time operation: insert-at-end for a well written smart array. Amortized constant time is not something we can easily compare with (say) logarithmic time. 4 Nov 2009 CS 311 Fall 2009 3

Review Generic Containers [1/2] A function that allows exceptions thrown by a client s code to propagate unchanged, is said to be exception-neutral. When exception-neutral code calls a clientprovided function that may throw, it does one of two things: Call the function outside a try block, so that any exceptions terminate our code immediately. Or, call the function inside a try block, then catch all exceptions, do any necessary cleanup, and re-throw. Client code calls Our package calls Implementation of templateparameter type This code might throw and if it does, this code handles the exception. 4 Nov 2009 CS 311 Fall 2009 4

Review Generic Containers [2/2] We can use catch-all, clean-up, re-throw to get both exception safety and exception neutrality. arr = new MyType[size]; try { std::copy(a, a+size, arr); } catch (...) { delete [] arr; throw; } Called outside any try block. If this fails, we exit immediately, throwing an exception. Called inside a try block. If this fails, we need to deallocate the array before exiting. This helps us meet the Basic Guarantee (also the Strong Guarantee if this function does nothing else). This makes our code exception-neutral. 4 Nov 2009 CS 311 Fall 2009 5

Review More on Linked Lists [1/8] Our first node-based data structure is a (Singly) Linked List. A Linked List is composed of nodes. Each has a single data item and a pointer to the next node. Linked List Head Nodes 3 1 5 3 5 2 NULL pointer These pointers are the only way to find the next data item. Once we have found a position within a Linked List, we can insert and delete in constant time. 3 1 5 4 5 2 3 1 5 4 5 2 7 4 Nov 2009 CS 311 Fall 2009 6

Review More on Linked Lists [2/8] With Linked Lists, we can do a fast splice: Before 3 1 5 4 5 2 6 2 1 1 3 3 After 3 1 5 4 5 2 6 2 1 1 3 3 Note: If we allow for efficient splicing, then we cannot efficiently keep track of a Linked List s size. 4 Nov 2009 CS 311 Fall 2009 7

Review More on Linked Lists [3/8] With Linked Lists, iterators, pointers, and references to items will always stay valid and never change what they refer to, as long as the Linked List exists unless we remove or change the item ourselves. Iterator 3 1 5 4 5 2 Remove Iterator 3 5 4 5 2 Insert Iterator 3 5 4 5 2 7 4 Nov 2009 CS 311 Fall 2009 8

Review More on Linked Lists [4/8] Look-up by index Search sorted Search unsorted Sort Insert @ given pos Remove @ given pos Splice Insert @ beginning Smart Array O(log n) O(n log n) Linked List O(n log n) * * *For Singly Linked Lists, we mean inserting or removing just after the given position. Doubly Linked Lists can help. ** if reallocation occurs. Otherwise,. Amortized constant time. Pre-allocation can help. ***For, need a pointer to the end of the list. Otherwise,. This is tricky. And, for remove @ end, it is basically impossible. Doubly Linked Lists can help. Remove @ beginning Insert @ end or ** amortized const or *** Remove @ end or *** Traverse Copy Swap Find faster with an array Rearrange faster with a Linked List 4 Nov 2009 CS 311 Fall 2009 9

Review More on Linked Lists [5/8] Arrays keep consecutive items in nearby memory locations. Many algorithms have the property that when they access a data item, the following accesses are likely to be to the same or nearby items. This property of an algorithm is called locality of reference. Once a memory location is accessed, the CPU cache can prefetch nearby memory locations. With an array, these are likely to hold nearby data items. Thus, because of cache prefetching, an array can have a significant speed advantage over a Linked List, when used with an algorithm that has good locality of reference. 4 Nov 2009 CS 311 Fall 2009 10

Review More on Linked Lists [6/8] In a Doubly Linked List, each node has a data item & two pointers: A pointer to the next node. A pointer to the previous node. Doubly Linked List 3 1 5 4 5 End-of-list pointer Doubly Linked Lists often have an end-of-list pointer. This can be efficiently maintained, resulting in constant-time insert and remove at the end. 4 Nov 2009 CS 311 Fall 2009 11

Review More on Linked Lists [7/8] Smart Array Doubly Linked List With Doubly Linked Lists, we can get rid of most of our asterisks. Look-up by index Search sorted Search unsorted O(log n) * if reallocation occurs. Otherwise,. Amortized constant time. Pre-allocation can help. Sort O(n log n) O(n log n) Insert @ given pos Remove @ given pos Splice Insert @ beginning Remove @ beginning Insert @ end or * amortized const Remove @ end Traverse Copy Swap Find faster with an array Rearrange faster with a Linked List 4 Nov 2009 CS 311 Fall 2009 12

Review More on Linked Lists [8/8] Two approaches to implementing a Linked List: A Linked List package to be used by others. An internal-use Linked List: part of some other package, and not exposed to client code. How would these be different? In particular, what classes might we define in each case? In the first case, many classes: container, node, iterator. In the second case, a node class, but possibly nothing else. 4 Nov 2009 CS 311 Fall 2009 13

Generic Sequence Types Introduction The C++ STL has four generic Sequence container types. Class template std::vector. A smart array. Much like the Assignment 5 package, but with more member functions. Class template std::basic_string. Much like std::vector, but aimed at character string operations. Mostly we use std::string, which is really std::basic_string<char>. Also std::wstring, which is really std::basic_string<std::wchar_t>. Class template std::list. A Doubly Linked List. Note: The Standard does not specify implementation. It specifies the semantics and order of operations. These were written with a Doubly Linked List in mind, and a D.L.L. is the usual implementation. Class template std::deque. Deque stands for Double-Ended QUEue. Say deck. Like std::vector, but a bit slower. Allows fast insert/remove at both beginning and end. 4 Nov 2009 CS 311 Fall 2009 14

Generic Sequence Types std::deque [1/4] We are familiar with smart arrays and Linked Lists. How is std::deque implemented? There are two big ideas behind it. First Idea A vector uses an array in which data are stored at the beginning. 0 1 2 3 4 5 This gives linear-time insert/remove at beginning, constant-time remove at end, and, if we do it right, amortized-constant-time insert at end. What if we store data in the middle? When we reallocate-and-copy, we move our data to the middle of the new array. 0 1 2 3 4 5 Result: Amortized insert, and remove, at both ends. 4 Nov 2009 CS 311 Fall 2009 15

Generic Sequence Types std::deque [2/4] Second Idea Doing reallocate-and-copy for a vector requires calling either the copy constructor or copy assignment for every data item. For large, complex data items, this can be time-consuming. Instead, let our array be an array of pointers to arrays, so that reallocate-and-copy only needs to move the pointers. This still lets us keep most of the locality-of-reference advantages of an array, when the data items are small. Array of Pointers Arrays of Data Items 0 1 2 3 4 5 6 7 8 4 Nov 2009 CS 311 Fall 2009 16

Generic Sequence Types std::deque [3/4] An implementation of std::deque typically uses both of these ideas. It probably uses an array of pointers to arrays. This might go deeper (array of pointers to arrays of pointers to arrays). The arrays may not be filled all the way to the beginning or the end. Reallocate-and-copy moves the data to the middle of the new array of pointers. Base Object Middle is used Array of Pointers 0 1 2 3 4 5 6 7 8 9 Arrays of Data Items Thus, std::deque is an array-ish container, optimized for: Insert/remove at either end. Possibly large, difficult-to-copy data items. The cost is complexity, and a slower [but still ] look-up by index. 4 Nov 2009 CS 311 Fall 2009 17

Generic Sequence Types std::deque [4/4] Essentially, std::deque is an array. Iterators are random-access. But it has some complexity to it, so it is a slow-ish array. Like vector, deque tends to keep consecutive items in nearby memory locations. So it tends to avoid cache misses, when used with algorithms having good locality of reference. However: Insertions at the beginning do not require items to be moved up. We speed up insert-at-beginning by allocating extra space before existing data. Reallocate-and-copy leaves the data items alone. We also speeds up insertion by trading value-type operations for pointer operations. Pointer operations can be much faster than value-type operations. A std::deque can do reallocate-and-copy using a raw memory copy, with no value-type copy ctor calls. The Bottom Line A std::deque is generally a good choice when you need fast insert/remove at both ends of a Sequence. Especially if you also want fast-ish look-up. Some people also recommend std::deque whenever you will be doing a lot of resizing, but do not need fast insert/remove in the middle. 4 Nov 2009 CS 311 Fall 2009 18

Generic Sequence Types Efficiency [1/2] We measure efficiency by counting steps. How do we count steps for a generic container type? We count both built-in operations and value-type operations. However, we typically expect that the most time-consuming operations are those on the value type. The C++ Standard, on the other hand, counts only value-type operations. For example, constant time in the Standard means that at most a constant number of value-type operations are performed. 4 Nov 2009 CS 311 Fall 2009 19

Generic Sequence Types Efficiency [2/2] vector, basic_string deque list Look-up by index Constant Constant Linear Search sorted Logarithmic Logarithmic Linear Insert @ given pos Linear Linear Constant Remove @ given pos Linear Linear Constant Insert @ beginning Linear Linear/ Amortized Constant* Constant Remove @ beginning Linear Constant Constant Insert @ end Linear/ Amortized Constant** Linear/ Amortized Constant* Constant Remove @ end Constant Constant Constant *Only a constant number of value-type operations are required. The C++ standard counts only value-type operations. Thus, it says that insert at the beginning or end of a std::deque is constant time. **Constant time if sufficient memory has already been allocated. All have traverse, copy, and search-unsorted, swap, and O(n log n) sort. 4 Nov 2009 CS 311 Fall 2009 20

Generic Sequence Types Common Features All STL Sequence containers have: iterator, const_iterator Iterator types. The latter acts like a pointer-to-const. vector, basic_string, deque have random-access iterators. list has bidirectional iterators. iterator begin(), iterator end() iterator insert(iterator, item) Insert before. Returns position of new item. iterator erase(iterator) Remove this item. Returns position of following item. push_back(item) Insert at the end. clear() Remove all items. resize(newsize) Change the size of the container. Not the same as vector::reserve, which sets capacity. In Addition vector, deque, list have: pop_back() Remove at the end. reference front(), reference back() Return reference to first, last item. deque, list have: push_front(item), pop_front() Insert & remove at the beginning. vector, basic_string, deque have: reference operator[](index) Look-up by index. vector has: reserve(newcapacity) Sets capacity to at least the given value. And there are other members 4 Nov 2009 CS 311 Fall 2009 21

Iterator Validity The Idea One of the trickier parts of using container types is making sure you do not use an iterator that has become invalid. Generally, valid iterators are those that can be dereferenced. We also call things like container.end() valid. These are past-the-end iterators. Consider the smart-array class from Assignment 5. When is one of its iterators invalidated? When reallocate-and-copy occurs. When the container is destroyed. When the container is resized so that the iterator is more than one past the end. Now consider a (reasonable) Linked-List class with iterators. When are such iterators invalidated? Only when the item referenced is erased. This includes container destruction. 4 Nov 2009 CS 311 Fall 2009 22

Iterator Validity Rules We see that different container types have different iteratorvalidity rules. When using a container, it is important to know the associated rules. A related topic is reference validity. Items in a container can be referred to via iterators, but also via pointers and references. Reference-validity rules indicate when pointers and references remain usable. Often these are the same as the iterator-validity rules, but not always. 4 Nov 2009 CS 311 Fall 2009 23

Iterator Validity std::vector For std::vector Reallocate-and-copy invalidates all iterators and references. When there is no reallocation, the Standards says that insertion and erasure invalidate all iterators and references except those before the insertion/erasure. Apparently, the Standard counts an iterator as invalidated if the item it points to changes. A vector can be forced to pre-allocate memory using std::vector::reserve. The amount of pre-allocated memory is the vector s capacity. We have noted that pre-allocation makes insert-at-end a constanttime operation. Now we have another reason to do pre-allocation: preserving iterator and reference validity. 4 Nov 2009 CS 311 Fall 2009 24

Iterator Validity std::deque For std::deque Insertion in the middle invalidates all iterators and references. Insertion at either end invalidates all iterators, but no references. Why? Erasure in the middle invalidates all iterators and references. Erasure at the either end invalidates only iterators and references to items erased. So deques have some validity advantages over vectors. 4 Nov 2009 CS 311 Fall 2009 25

Iterator Validity std::list For std::list An iterator or reference always remains valid until the item it points to goes away. When the item is erased. When the list is destroyed. In some situations, these validity rules can be a big advantage of std::list. 4 Nov 2009 CS 311 Fall 2009 26

Iterator Validity Example // v is a variable of type vector<int> // Insert a 1 before each 2 in vector v: for (vector<int>::iterator iter = v.begin(); iter!= v.end(); ++iter) { if (*iter == 2) v.insert(iter, 1); } What is wrong with the above code? The insert call invalidates iterator iter. Even if iter stays valid, after an insertion, it points to the 1 inserted. After being incremented, it points to the 2 again. Infinite loop. How can we fix it? 4 Nov 2009 CS 311 Fall 2009 27

Unit Overview What is Next This completes our discussion of Sequences in full generality. Next, we look at two restricted versions of Sequences, that is, ADTs that are much like Sequence, but with fewer operations: Stack. Covered in chapter 6 in the text. Queue. Covered in Chapter 7 in the text. For each of these, we look at: What it is. Implementation. Availability in the C++ STL. Applications. 4 Nov 2009 CS 311 Fall 2009 28