M-way Trees and B-Trees



Similar documents
Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

From Last Time: Remove (Delete) Operation

Physical Data Organization

Symbol Tables. Introduction

CSE 326: Data Structures B-Trees and B+ Trees

Binary Search Trees CMPSC 122

Ordered Lists and Binary Trees

Big Data and Scripting. Part 4: Memory Hierarchies

Heaps & Priority Queues in the C++ STL 2-3 Trees

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Binary Search Trees (BST)

Data Structures and Algorithms

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

Full and Complete Binary Trees

DATABASE DESIGN - 1DL400

TREE BASIC TERMINOLOGIES

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Converting a Number from Decimal to Binary

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

A Comparison of Dictionary Implementations

How To Create A Tree From A Tree In Runtime (For A Tree)

Why Use Binary Trees?

Binary search tree with SIMD bandwidth optimization using SSE

Fundamental Algorithms

How an electronic shutter works in a CMOS camera. First, let s review how shutters work in film cameras.

Recommended hardware system configurations for ANSYS users

Unit Storage Structures 1. Storage Structures. Unit 4.3

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Chapter 13: Query Processing. Basic Steps in Query Processing

Name: 1. CS372H: Spring 2009 Final Exam

Fast Sequential Summation Algorithms Using Augmented Data Structures

Module 9. User Interface Design. Version 2 CSE IIT, Kharagpur

IMPLEMENTING CLASSIFICATION FOR INDIAN STOCK MARKET USING CART ALGORITHM WITH B+ TREE

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Laboratory Module 6 Red-Black Trees

International Journal of Software and Web Sciences (IJSWS)

Binary Heap Algorithms

ECE 250 Data Structures and Algorithms MIDTERM EXAMINATION /5:15-6:45 REC-200, EVI-350, RCH-106, HH-139

Structure for String Keys

Test Automation Architectures: Planning for Test Automation

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

10CS35: Data Structures Using C

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

Lecture Notes on Binary Search Trees

Trace-Based and Sample-Based Profiling in Rational Application Developer

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

PES Institute of Technology-BSC QUESTION BANK

Parallelization: Binary Tree Traversal

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

GOD S BIG STORY Week 1: Creation God Saw That It Was Good 1. LEADER PREPARATION

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

Computer Game Programming Intelligence I: Basic Decision-Making Mechanisms

Persistent Binary Search Trees

Data Warehousing und Data Mining

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Analysis of Algorithms I: Binary Search Trees

Count the Dots Binary Numbers

Algorithms and Data Structures

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Dimension: Data Handling Module: Organization and Representation of data Unit: Construction and Interpretation of Simple Diagrams and Graphs

Decimal Notations for Fractions Number and Operations Fractions /4.NF

The ADT Binary Search Tree

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

Data Structures, Practice Homework 3, with Solutions (not to be handed in)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Lecture Notes on Binary Search Trees

Introduction to data structures

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Binary Search Trees 3/20/14

Access Tutorial 2: Tables

Algorithms and Methods for Distributed Storage Networks 7 File Systems Christian Schindelhauer

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Data Structures. Topic #12

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

AC : A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION

Materials: Student-paper, pencil, circle manipulatives, white boards, markers Teacher- paper, pencil, circle manipulatives, worksheet

Chapter 8: Binary Trees

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

Outline. Introduction Linear Search. Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Transcription:

Carlos Moreno cmoreno @ uwaterloo.ca EIT-4103 https://ece.uwaterloo.ca/~cmoreno/ece250

Standard reminder to set phones to silent/vibrate mode, please!

Once upon a time... in a course that we all like to call ECE-250... We talked about trees (hierarchical data structures) In particular, binary search trees (non-hierarchical; just take advantage of the tree structure) Including balanced binary search trees (we talked about AVL trees)

Today, we'll discuss: M-Way trees In-order traversal of an M-way tree B-Trees Only basic concepts and rationale The details will be optional material meaning that it will be the topic for bonus marks questions on the assignments and on the final.

M-Way Trees Not to be confused with N-ary trees, which are trees where the nodes have a fixed number of children (binary trees being a particular case, with N=2) M-Way Trees are trees where the nodes store multiple values. They are search trees (just not binary) They also have multiple children (a fixed number of them, unlike with general trees)

M-Way Trees In particular, a node in an M-Way tree has: M-1 data values M children Notice: A Binary search tree is a particular case of an M-Way tree (M =?)

M-Way Trees The constraint that makes them search trees is that for each node in the tree, the values in the noide's sub-trees are related to the values in the node: If the values are {v 1, v 2,, v M-2 } and the children (sub trees) are {T1, T 2,, T M-1 }, then: Every value in the tree T k (1 < k < M-1) is between v k-1 and v k For T 1, every value in the tree is less than v 1 For T M-1, every value in the tree is greater than v M-2

M-Way Trees Example for a 4-Way tree: 7 20 45 2 5 6 9 12 17 26 31 39 46 53 71

M-Way Trees Example for a 4-Way tree: 7 20 45 2 5 6 9 12 17 26 31 39 46 53 71 For in-order traversal, we extend the idea from binary trees: first, visit child T k, then process value v k, then visit child T k+1.

M-Way Trees Example for a 4-Way tree: 7 20 45 2 5 6 9 12 17 26 31 39 46 53 71 BTW... Do you notice anything interesting about this tree? (Hint: it contains 15 values)

M-Way Trees The main point with M-Way trees is to reduce the height! Of course, in terms of Landau symbols, there's no improvement we're still Ω(log n), with Θ(log n) being reached if the tree is balanced. But we have an improvement with respect to binary trees (a reduction of the height) by a constant factor of... (you guys tell me?)

M-Way Trees The main point with M-Way trees is to reduce the height! Of course, in terms of Landau symbols, there's no improvement we're still Ω(log n), with Θ(log n) being reached if the tree is balanced. But we have an improvement with respect to binary trees (a reduction of the height) by a constant factor of... Right, lg(m):

M-Way Trees The number of nodes for an M-way tree of height h grows with M h thus, the height goes with log M n, and we have: n = M log M n log 2 n = log 2 M log M n = log 2 ( M ) log M n

M-Way Trees Advantage: More efficient (though only by a constant speedup factor). Disadvantage: More complex structure

M-Way Trees Advantage: More efficient (though only by a constant speedup factor). Disadvantage: More complex structure When is this extra complexity justified?

M-Way Trees Advantage: More efficient (though only by a constant speedup factor). Disadvantage: More complex structure When is this extra complexity justified? Generally speaking, when the extra efficiency is necessary.

M-Way Trees Advantage: More efficient (though only by a constant speedup factor). Disadvantage: More complex structure When is this extra complexity justified? Generally speaking, when the extra efficiency is necessary. One particular situation is when moving from one node to another is particularly expensive.

M-Way Trees Advantage: More efficient (though only by a constant speedup factor). Disadvantage: More complex structure When is this extra complexity justified? Generally speaking, when the extra efficiency is necessary. One particular situation is when moving from one node to another is particularly expensive. What would be an example where that would be the case?

M-Way Trees Advantage: More efficient (though only by a constant speedup factor). Disadvantage: More complex structure When is this extra complexity justified? Generally speaking, when the extra efficiency is necessary. One particular situation is when moving from one node to another is particularly expensive. What would be an example where that would be the case? Hint: What in a computer can be particularly slow?

B-Trees use this approach (among several other aspects) to store data on permanent storage (hard disk). Typically used for database systems.

B-Trees use this approach (among several other aspects) to store data on permanent storage (hard disk). Typically used for database systems.

B-Trees Two key aspects that affect the design of this data structure:

B-Trees Two key aspects that affect the design of this data structure: Hard disks are slow

B-Trees Two key aspects that affect the design of this data structure: Hard disks are slow scratch that... Hard disks are slooooooooowwww... They are mechanical beasts living in an electronic circuits world!

B-Trees Two key aspects that affect the design of this data structure: Hard disks are slow scratch that... Hard disks are slooooooooowwww... They are mechanical beasts living in an electronic circuits world! The other aspect being: access is by blocks (typical unit is 4kbytes reading 1 byte or reading 4kbytes takes essentially the same amount of time)

About hard disks speed... Two key parameters: Access time Transfer speed Transfer speed is not that bad once we start reading data, typically things move rather fast (hundreds of megabytes per second) However, access time results from the head having to move (as in, mechanical movement) to the right place to read the data

About hard disks speed... Two key parameters: Access time Transfer speed Transfer speed is not that bad once we start reading data, typically things move rather fast (hundreds of megabytes per second) However, access time results from the head having to move (as in, mechanical movement) to the right place to read the data typical figures in the order of 10 ms!! (that's an incredibly slow 100 Hz!!!!!)

B-Trees take these aspects into consideration by: Storing internal nodes as 512-Way trees (why 512? A disk block is 4k, and typical key+pointer combinations are 8 bytes per item) Nice side-effect: we load entire nodes into memory with a single disk access.

B-Trees take these aspects into consideration by: Storing internal nodes as 512-Way trees (why 512? A disk block is 4k, and typical key+pointer combinations are 8 bytes per item) Nice side-effect: we load entire nodes into memory with a single disk access. But more importantly: we dramatically reduce the amount of disk accesses (notice that descending to each child node requires a new disk access i.e., another 10 ms!!) Why do we reduce the amount of accesses?

B-Trees take these aspects into consideration by: Amount of disk accesses goes with the height of the tree (we're following the nodes, until reaching the appropriate leaf node, which is where the data is), and there are h = log M (n) of them.

B-Trees take these aspects into consideration by: Amount of disk accesses goes with the height of the tree (we're following the nodes, until reaching the appropriate leaf node, which is where the data is), and there are h = log M (n) of them. M = 512 = 2 9, meaning 9 times fewer disk accesses than with a binary search tree!

B-Trees are in a sense a hybrid structure: internal nodes are M-Way trees, and leaf nodes are just a block of records (essentially, a plain node containing an array structure)

B-Trees are in a sense a hybrid structure: internal nodes are M-Way trees, and leaf nodes are just a block of records (essentially, a plain node containing an array structure) We make leaf nodes also the size of a block (4k), to make the most out of each disk access.

Additionally, the balancing is done in a way that we ensure that all leaf nodes are at the same depth. Access time is uniform for all data in the database.

B-Trees take these aspects into consideration by: Additionally, the balancing is done in a way that we ensure that all leaf nodes are at the same depth. Access time is uniform for all data in the database. We observe that we need to do array searches and various operations in memory the point being, that time is negligible compared to disk access time; so really, the issues of asymptotic analysis when talking about data structure for disk storage become a secondary issue!

Summary During today's lesson: Looked into the notion of M-Way trees Discussed advantages, disadvantages, and when they are justified Discussed the basic notions and rationale for B Trees. Search structure for disk storage of data (e.g., database systems) Slow access time + block-based access: Minimize number of accesses by widening the tree, thus reducing the height Nodes are just wide enough that they fit within one 4k disk block