CSE 2123 Collections: Sets and Iterators (Hash functions and Trees) Jeremy Morris



Similar documents
Data Structures in the Java API

Symbol Tables. Introduction

The Java Collections Framework

Collections in Java. Arrays. Iterators. Collections (also called containers) Has special language support. Iterator (i) Collection (i) Set (i),

Class 32: The Java Collections Framework

Java Map and Set collections

LINKED DATA STRUCTURES

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

Ordered Lists and Binary Trees

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Why Use Binary Trees?

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Data Structures in Java. Session 15 Instructor: Bert Huang

Datastrukturer och standardalgoritmer i Java

Masters programmes in Computer Science and Information Systems. Object-Oriented Design and Programming. Sample module entry test xxth December 2013

Arrays. Atul Prakash Readings: Chapter 10, Downey Sun s Java tutorial on Arrays:

Basic Programming and PC Skills: Basic Programming and PC Skills:

Working with Java Collections

API for java.util.iterator. ! hasnext() Are there more items in the list? ! next() Return the next item in the list.

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Binary Heap Algorithms

CMSC 202H. ArrayList, Multidimensional Arrays

14/05/2013 Ed Merks EDL V1.0 1

Data Structures and Algorithms

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

A Comparison of Dictionary Implementations

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

dictionary find definition word definition book index find relevant pages term list of page numbers

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

Data storage Tree indexes

DATA STRUCTURES USING C

Analysis of a Search Algorithm

D06 PROGRAMMING with JAVA

AP Computer Science Java Mr. Clausen Program 9A, 9B

7.1 Our Current Model

cs2010: algorithms and data structures

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Big O and Limits Abstract Data Types Data Structure Grand Tour.

From Last Time: Remove (Delete) Operation

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Chapter 8: Bags and Sets

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Java Coding Practices for Improved Application Performance

Software Engineering II. Java 2 Collections. Bartosz Walter <Bartek.Walter@man.poznan.pl>

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Software Engineering Techniques

Generic Types in Java. Example. Another Example. Type Casting. An Aside: Autoboxing 4/16/2013 GENERIC TYPES AND THE JAVA COLLECTIONS FRAMEWORK.

CompSci-61B, Data Structures Final Exam

6 March Array Implementation of Binary Trees

CSE 1223: Introduction to Computer Programming in Java Chapter 2 Java Fundamentals

Zabin Visram Room CS115 CS126 Searching. Binary Search

Efficient Data Structures for Decision Diagrams

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Chapter 14 The Binary Search Tree

Introduction to Java

STORM. Simulation TOol for Real-time Multiprocessor scheduling. Designer Guide V3.3.1 September 2009

Data Structures. Algorithm Performance and Big O Analysis

Lecture 9. Semantic Analysis Scoping and Symbol Table

Java Software Structures

Stacks. Data Structures and Data Types. Collections

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Sources: On the Web: Slides will be available on:

Introduction to Programming System Design. CSCI 455x (4 Units)

The ADT Binary Search Tree

Data Structures Fibonacci Heaps, Amortized Analysis

File System Management

Java Interview Questions and Answers

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

You are to simulate the process by making a record of the balls chosen, in the sequence in which they are chosen. Typical output for a run would be:

Java CPD (I) Frans Coenen Department of Computer Science

DATABASE DESIGN - 1DL400

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Computer Programming I

Algorithms and Data Structures Written Exam Proposed SOLUTION

Course: Programming II - Abstract Data Types. The ADT Stack. A stack. The ADT Stack and Recursion Slide Number 1

22c:31 Algorithms. Ch3: Data Structures. Hantao Zhang Computer Science Department

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Unit Storage Structures 1. Storage Structures. Unit 4.3

CIS 631 Database Management Systems Sample Final Exam

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Data Structures Using C++ 2E. Chapter 5 Linked Lists

JAVA COLLECTIONS FRAMEWORK

Full and Complete Binary Trees

Practical Session 4 Java Collections

CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

10CS35: Data Structures Using C

Data Structures and Algorithms Written Examination

Physical Data Organization

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

Binary Search Trees CMPSC 122

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Chapter 13: Query Processing. Basic Steps in Query Processing

Recursion. Definition: o A procedure or function that calls itself, directly or indirectly, is said to be recursive.

J a v a Quiz (Unit 3, Test 0 Practice)

Transcription:

CSE 2123 Collections: Sets and Iterators (Hash functions and Trees) Jeremy Morris 1

Collections - Set What is a Set? A Set is an unordered sequence of data with no duplicates Not like a List where you know where in the collection you can find a data element More like a bag of data elements, where you just know that the element is somewhere in the bag Also not like a List no duplicate values are allowed The Set collection is an implementation of the mathematical idea of a Set 2

Collections - Set The Set collection is an implementation of the mathematical idea of a Set { 10, 5, 8, 7, 3, 9 } Elements are not kept in any particular order No duplicate elements Same rules for the Set collection 3

Collections Set Sets rely on a few standard methods: Put an object into the Set: boolean add(e value) Remove an object from the Set boolean remove(e value) Check to see if the Set contains a value: boolean contains(e value) 4

Collections Set Set methods (cont): Return an iterator over the Set: Iterator<E> iterator() Get the number of elements in the Set: int size() Check to see if the Set is empty boolean isempty() 5

Collections - Set Like other collections, Java implements a number of different types of Sets Each useful under different circumstances We will talk about HashSets and TreeSets HashSets use a hash function to ensure uniqueness of elements TreeSets use a data structure known as a binary tree to ensure uniqueness of elements 6

Collections Set (HashSets) HashSet<String> Marley, Bob Cool, Joe Brown, Charles HashSet of names Elements are stored in no particular order No two names can be the same Hash function ensures uniqueness 7

Collections - Set Declaring a HashSet A HashSet of Strings: Set<String> myset = new HashSet<String>(); A HashSet of Integers: Set<Integer> myset = new HashSet<Integer>(); 8

Collections - Set Typical Set usage: Fill the Set with values Obtain an Iterator over the Set Process values from the Set until the Iterator has seen all the values 9

Collections - Iterators An iterator is an object that we can use to examine elements of a collection Anything that implements the Collection interface will provide an iterator Use iterators instead of counters in a while loop while loop: while (counter < list.size() ) { ; counter = counter + 1; } Iterator while (listiter.hasnext() ) { } 10

Collections - Iterators We obtain Iterator objects by calling the appropriate method on our Collection Anything that implements the Collection interface must provide a method named iterator that we can call to get an appropriate Iterator List<Integer> intlist = new ArrayList<Integer>(); Iterator<Integer> intiterator = intlist.iterator(); Set<Double> dblset = new HashSet<Double>(); Iterator<Double> dbliterator = dblset.iterator(); 11

Collections - Iterators Important methods Check to see if there are more elements in the iterator boolean hasnext() Get the next object off from the iterator E next() Remove the last object seen by the iterator from the ArrayList void remove() 12

Collections - Iterators Note that for a Set collection, we cannot use a typical while loop with a counter This works for Lists, because elements in Lists are stored sequentially, so the index has a meaning Elements in Sets are not stored in any particular order, so we cannot access them with an index The only way to see everything in a Set is to use an Iterator 13

Collections Set Iterator Example public static void main(string[] args) { Set<Integer> myints = new HashSet<Integer>(); for (int index=0; index<20; index++) { myints.add(index); // create a set of numbers 0-19 } Iterator<Integer> myiter = myints.iterator(); } while (myiter.hasnext()) { System.out.println(myIter.next()); } 14

Collections Set Iterator Example 2 public static void main(string[] args) { Set<String> myset = new HashSet<String>(); myset.add("marley, Bob"); myset.add("cool, Joe"); myset.add("brown, Charles"); System.out.println(mySet); } Iterator<String> myiter = myset.iterator(); while (myiter.hasnext()) { System.out.println(myIter.next()); } 15

Hashing and Hash Tables Recall: The HashSet class uses a hash function to implement a Set How does this work? Every object to be stored in a HashSet must have an associated hash function The hash function is a function that maps each object to a numeric value This numeric value provides an index for a bucket where elements with that numeric value are stored 16

Collections - HashSet Buckets Marley, Bob Cool, Joe Brown, Charles -2094641181-57026918 -505324353 Cool, Joe Brown, Ch Marley, Bob HashSet of names Elements are stored in buckets Each bucket has an associated index value The element s hash function tells us which bucket to use When we need to find an element, just apply the hash function again Result tells us which bucket to look in 17

Collections - HashSet Buckets variants gelato -1249574770 variants What happens if two elements have the same hash function result? This is known as a collision In these cases, the HashSet first checks to see if they really are equal (using the objects equals() method). If they are not equal, then it must maintain both values at the same index 18

Collections - HashSet Buckets variants -1249574770 vari ge gelato What happens if two elements have the same hash function result? This is known as a collision In these cases, the HashSet first checks to see if they really are equal (using the objects equals() method). If they are not equal, then it must maintain both values at the same index One method: use a LinkedList of values for the buckets 19

Hashing User Defined Classes When we create our own classes, we need to provide a hash function for our objects The hashcode() method is inherited from the Object class Basic functionality if two objects have different memory locations, hashcode returns different values Two different new declarations, two different locations in memory If we want different behavior, we need to override both the hashcode method and the equals method 20

Hashing User Defined Classes How do we override the hashcode method? Look at what makes our objects equal Come up with a hashcode method based on that Student objects are equal if they have the same studentid So we can use the hashcode of our id member variable as our hashcode for the Student class: @Override public int hashcode() { return id.hashcode(); } 21

Hashing User Defined Classes Sometimes we need to do more than just reuse an existing hash function Need to come up with our own functions As an example, the hash function for a String s is: int HASH_MULTIPLIER = 31; int h=0; for (int i=0; i<s.length(); i++) { h = HASH_MULTIPLIER*h + s.charat(i); } The hashcode value is a mathematical function of each character in the String s Similar operations can be written for our own classes 22

Hashing User Defined Classes equals() must also be properly implemented Two objects that are equal according to equals() must return the same value from hashcode() Two objects that are not equal according to equals() can have the same result from hashcode() but it should be avoided Strings, Integers, etc. all have hashcode and equals set up properly We need to ensure that our own classes have it as well. 23

Collections Set (TreeSets) TreeSet<String> Brown, Charles Cool, Joe Marley, Bob TreeSet of names Elements are stored in sorted order No two names can be the same Stored in a binary tree to ensure uniqueness 24

Collections Set (TreeSets) Declaring a TreeSet A TreeSet of Strings: Set<String> myset = new TreeSet<String>(); A TreeSet of Integers: Set<Integer> myset = new TreeSet<Integer>(); 25

Collections - TreeSet TreeSets use a binary tree to store elements A binary tree is a data structure with many applications in Computer Science Elements in a binary tree are stored in nodes Each node holds an element of data and references to two child nodes a right child and a left child In a binary search tree, nodes are kept in sorted order by always putting elements less than the current node in the left child, and elements larger than the current node in the right child. 26

Collections TreeSet (Binary Tree) An example: Suppose we have elements: {10, 3, 7, 19, 22, 1} to store in a binary search tree We add 10 first it becomes the head of the tree It s children are currently empty (null references) 10 27

Collections TreeSet (Binary Tree) {10, 3, 7, 19, 22, 1} Next we add the 3 It s less than 10, so it becomes the left-child of the head node 10 3 28

Collections TreeSet (Binary Tree) {10, 3, 7, 19, 22, 1} Next we add the 7 It s less than 10, so we go down the left path It s more than 3, so it becomes the right child of 3 3 10 7 29

Collections TreeSet (Binary Tree) {10, 3, 7, 19, 22, 1} 19 is greater than 10 It becomes the right child of 10 3 10 19 7 30

Collections TreeSet (Binary Tree) {10, 3, 7, 19, 22, 1} We add 22 22 is greater than 10, so we go down the right path It is greater than 19, so it becomes the right child of 19 3 10 7 19 22 31

Collections TreeSet (Binary Tree) {10, 3, 7, 19, 22, 1} 10 Finally we add 1 1 is less than 10, so we go down the left path 3 19 It is less than 3, so it becomes the left child of 3 1 7 22 32

Collections TreeSet (Binary Tree) In the end, all of our elements are stored in the tree in sorted order Suppose we want to know if 6 is in our tree 3 10 19 Follow the left branch to 3 then the right branch to 7 1 7 22 and we see that 7 has no children so 6 cannot be in the tree Searching a binary tree is fast, if the tree is balanced 33

Collections TreeSet (Binary Tree) What do we mean by balanced? 10 Roughly, this means that each side of the tree contains the same number of nodes 3 19 Another way to think about it keeping the height of the tree small 1 7 22 Unbalanced trees can cause problems 34

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? 1 35

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? 1 3 36

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? 1 3 7 37

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? 1 3 7 10 38

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? 1 3 7 10 19 39

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? 1 3 7 10 19 22 40

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? One giant chain of nodes 1 3 7 10 19 22 41

Collections TreeSet (Binary Tree) Unbalanced trees Suppose we inserted elements in a different order {1,3,7,10,19,22} What would our tree look like then? 1 3 7 10 19 22 42

Collections TreeSet (Binary Trees) Why is an unbalanced tree a problem? Consider the two trees below How many steps does it take to find the number 22 in each of them? Which one is faster? 43

Collections - TreeSet Because balanced trees are so important for efficiency, TreeSet is actually built on a selfbalancing binary search tree A red-black binary tree structure Finding elements in the tree the same as a binary search tree Adding elements to the tree slightly more complex Must ensure that the tree remains balanced Worth it, because algorithm gives us balanced trees regardless of the order we insert elements in 44

Your Turn - Trees Provide the binary search tree that would result if the following integers were added to the tree in the given order: [ 8, 3, 23, 15, 1, 5, 42, 64, 36 ] Provide the binary search tree that would result if the following characters were added in the given order: [ r, q, z, a, c, l, x, w, b ] 45

Collections TreeSet Example public static void main(string[] args) { Set<Integer> myints = new TreeSet<Integer>(); for (int index=0; index<20; index++) { myints.add(index); // create a set of numbers 0-19 } Iterator<Integer> myiter = myints.iterator(); } while (myiter.hasnext()) { System.out.println(myIter.next()); } 46

Collections TreeSet Example 2 public static void main(string[] args) { Set<String> myset = new TreeSet<String>(); myset.add("marley, Bob"); myset.add("cool, Joe"); myset.add("brown, Charles"); System.out.println(mySet); } Iterator<String> myiter = myset.iterator(); while (myiter.hasnext()) { System.out.println(myIter.next()); } 47

TreeSet User Defined Classes Binary trees depend on the ordering of elements to be added to them So if we want to use a TreeSet, the elements we add to it must have an ordering How do we ensure this? Any object we put in a TreeSet must have some way to compare it with other objects in the Set So for any class that we want to use in a TreeSet, we should implement the Comparable interface Strings, Characters, Integers, etc. all have this built in We need to ensure our own classes have this functionality if we need it 48

HashSet vs. TreeSet If we need a Set, what kind of Set should we use? If we need the Set to iterate in sorted order, we should use a TreeSet Otherwise, we should use a HashSet HashSet allows us to add and remove elements from the Set in constant time the time it takes to add and remove elements is independent of the number of elements in the Set TreeSet uses log(n) time to add and remove elements from the Set. Fast, but dependent on the number of elements in our Collection 49