Big Data and Scripting. Part 4: Memory Hierarchies

Similar documents
Binary Heap Algorithms

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Analysis of Algorithms I: Binary Search Trees

Databases and Information Systems 1 Part 3: Storage Structures and Indices

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

External Memory Geometric Data Structures

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Heaps & Priority Queues in the C++ STL 2-3 Trees

Lecture 1: Data Storage & Index

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

DATABASE DESIGN - 1DL400

CSE 326: Data Structures B-Trees and B+ Trees

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Physical Data Organization

Data Warehousing und Data Mining

From Last Time: Remove (Delete) Operation

Data storage Tree indexes

6 March Array Implementation of Binary Trees

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

In-Memory Databases MemSQL

Ordered Lists and Binary Trees

Algorithms Chapter 12 Binary Search Trees

Data Structures and Algorithm Analysis (CSC317) Intro/Review of Data Structures Focus on dynamic sets

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Converting a Number from Decimal to Binary

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Symbol Tables. Introduction

Binary Heaps. CSE 373 Data Structures

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Vector storage and access; algorithms in GIS. This is lecture 6

How To Create A Tree From A Tree In Runtime (For A Tree)

CIS 631 Database Management Systems Sample Final Exam

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Binary Search Trees (BST)

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Full and Complete Binary Trees

Algorithms and Data Structures

DATA STRUCTURES USING C

Binary Search Trees. Ric Glassey

UNIVERSITY OF LONDON (University College London) M.Sc. DEGREE 1998 COMPUTER SCIENCE D16: FUNCTIONAL PROGRAMMING. Answer THREE Questions.

Chapter 12 File Management

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Data Structures and Algorithms

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Cpt S 223. School of EECS, WSU

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Why Use Binary Trees?

Chapter 13: Query Processing. Basic Steps in Query Processing

A Comparison of Dictionary Implementations

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Binary Trees and Huffman Encoding Binary Search Trees

Original-page small file oriented EXT3 file storage system

Binary Search Trees CMPSC 122

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

Binary Search Trees 3/20/14

IMPLEMENTING CLASSIFICATION FOR INDIAN STOCK MARKET USING CART ALGORITHM WITH B+ TREE

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Unit Storage Structures 1. Storage Structures. Unit 4.3

SMALL INDEX LARGE INDEX (SILT)

The Hadoop Distributed File System

Persistent Binary Search Trees

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

M-way Trees and B-Trees

Algorithms and Data Structures

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Lecture 2 February 12, 2003

CS711008Z Algorithm Design and Analysis

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Overview of Storage and Indexing

An Evaluation of Self-adjusting Binary Search Tree Techniques

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Binary Trees. Wellesley College CS230 Lecture 17 Thursday, April 5 Handout #28. PS4 due 1:30pm Tuesday, April

Indexing Big Data. Michael A. Bender. ingest data ?????? ??? oy vey

Lecture Notes on Binary Search Trees

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

File Management. Chapter 12

Creating tables in Microsoft Access 2007

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Lecture Notes on Binary Search Trees

Data Structures, Practice Homework 3, with Solutions (not to be handed in)

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Data Structures. Level 6 C Module Descriptor

10CS35: Data Structures Using C

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

Sorting Hierarchical Data in External Memory for Archiving

Binary Coded Web Access Pattern Tree in Education Domain

Transcription:

1, Big Data and Scripting Part 4: Memory Hierarchies

2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations) block of B machine words I/O operation: reading/writing one block topic provide data structures using external memory minimize I/O operations

3, B-Trees basic idea store elements identified by keys keys are sortable (e.g. from N) construct tree with two types of nodes: leaves list of elements and keys keys within some interval [k 1, k n ] inner nodes including root list of (sorted) keys k 1 <... < k n, n B list of children c 0,..., c n elements with key k i k < k i+1 in leaf below c i

4, B-Trees example 4,7,11 11,21,30 14,18,20 24,27,30 inner nodes 1 2 3 1,4 4,7 7,11 11,14 14,18 18,20 20,24 24,27 27,30 4 5 6 9 10 11 12 13 14 15 17 18 19 21 22 23 25 26 27 28 stored content use O(B) keys, addresses in inner nodes use B/size of content in leaf nodes each node fits in one block

5, B-Trees applications fast storage and retrieval of (key, value) pairs keep dynamically sorted list, e.g. priority queue range reporting elements with keys in range k 1, k 2 can be extracted as subtree usage in external memory-scenario priority -assessment which data blocks are most likely needed in the near future insert into B-Tree using priority as key keep only top of the B-Tree in memory

6, B-Trees retrieval of an element retrieve element with key k current=root; // root of the tree (is inner node) while(current is inner node){ choose i such that k i k < k i+1 current=c i ; // switch to corresponding child } return(element with key k in current); choose i with binary search find element in leaf also by binary search

7, B-Trees logarithmic access all leaves have same distance to root node level of a node: distance to leaves weight of a node: number of leaves in sub-tree balance invariant: every node has at least B/2 and at most B children with invariant: descending one level reduces leaves below current node by O(B) at most O(log B N) descends to leaf B is constant O(1) time in each node note: larger B lesser height of root

8, B-Trees storage in external memory store each node (inner and leaves) in block on disk inner nodes: 1/2 block size for key-intervals 1/2 block size for pointers to children each inner node has (up to) 1/2 (block size) children leaves: e.g. list of keys pointing to position of values in block disk usage: depends on N and size of values, assume k blocks for storage height: O(log B k), on each level: B i blocks O(k) blocks for indexing

9, B-Trees: inserting elements B-Trees, administer dynamic data structures consequently: data insertion and deletion problem: balance invariant insert element current=leaf for element insert element in current while(size(current)> B){ current=split(current); }

10, B-Trees: splitting a node keep balance invariant for insertions by splitting large nodes idea: split node adjust addressing in parent split(node) find median m create two new nodes for keys m and > m insert new interval border in parent return parent (for recursion) node is larger than B, split results in two nodes B/2 new interval border in parent may lead to overrun recursion

11, B-Trees: deleting elements analogous to insertion problem: nodes can shrink below size B/2 repair by merging into parent node if overrun in parent node: repair analogous to insertion

12, Buffer Trees so far: every update causes read/write operations and possible reordering of the tree buffer trees avoid frequent updates by buffering operations every node has buffer of pending operations when buffer overruns it is flushed: load content into memory sort and execute operations on sub trees operations on sub trees are written to corresponding buffers new updates are placed in root buffer before balancing operations, buffers of involved nodes are flushed

13, Implementing external priority queues priority queue insert elements with keys extract element with lowest key (i.e. highest priority) can be implemented with dynamic structure that ensures order of elements B-Trees are an example

14, Implementing external priority queues Implementation with Buffered Trees keep root buffer in memory keep leftmost leaves in memory all buffers from root to leftmost leaves are kept empty only top of queue is accessed in retrieval top of queue equals leftmost leaves corresponding buffers empty top of queue is sorted leftmost leaves in memory top of queue in memory rest of queue is sorted on demand