R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants
|
|
|
- Brendan O’Connor’
- 10 years ago
- Views:
Transcription
1 R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions and splits can be complex Accommodates non-point data easily 1 2 R-Tree Balanced (similar to B+ tree) Index node I is an n-dimensional rectangle of the form (I 0, I 1,..., I n-1 ) where each interval is a range [a,b] Leaf node entry: (I, tuple_id) Non-leaf node entry: (I, child_ptr) M is maximum entries per node. m M/2 is a parameter specifying the minimum number of entries per node. Invariants 1. Every leaf (non-leaf) has between m and M records (children) except for the root. 2. Root has at least two children unless it is a leaf. 3. Every index entry is the smallest rectangle that contains the children. (MBR = Minimum Bounding Rectangle). 4. All leaves appear at the same level
2 Example Example 5 6 Searching Given a search rectangle S Start at root and locate all child nodes which intersect S (via linear search). 2. Search the subtrees of those child nodes. 3. When you get to the leaves, return entries whose rectangles intersect S. Searches may require inspecting several paths. Worst case running time is not so good... Searching for R
3 Insertion I1 I2 I3 Insertion is done at the leaves Where to put new entry with rectangle R? 1. Start at root. 2. Go down the tree by choosing child whose rectangle needs the least enlargement to include R. In case of a tie, choose child with smallest area. 3. If there is room in the correct leaf node, insert it. Otherwise split the node (to be continued...) 4. Adjust the tree If the root was split into nodes N 1 and N 2, create new root with N 1 and N 2 as children. R1 R3 R2 R11 I1 I3 R8 R10 R9 I2 R8 R9 R I3 I1 I2 I3 I3 I1 I2 I3 R1 R8 R9 R10 R1 R8 R9 R10 R2 I1 R3 R11 R8 R10 R9 I2 R2 I1 R3 R11 R8 R10 R9 I
4 I3 I1 I2 I3 I3 I1 I2 I3 R1 R8 R9 R10 R1 R8 R9 R10 R2 R3 R8 R10 R2 R3 R8 R10 I1 R11 R9 I2 I1 R11 R9 I Splitting Nodes Problem: Divide M+1 entries among two nodes so that it is unlikely that the nodes are needlessly examined during a search. Solution: Minimize the probability of accessing the nodes during a query. Exhaustive algorithm. Quadratic algorithm. Linear time algorithm. Exhaustive Search Minimize the sum of probabilities of accessing the two pages by a query requires knowledge of range or number of nearest neighbors Try all possible combinations. Optimal results Bad running time!
5 Quadratic Algorithm 1. Find pair of entries E 1 and E 2 that maximizes area(j) - area(e 1 ) - area(e 2 ) where J is covering rectangle. Pick the pair with least affinity, i.e., the pair that wastes maximum space 2. Put E 1 in one group, E 2 in the other. 3. If one group has M-m+1 entries, put the remaining entries into the other group and stop. If all entries have been distributed then stop. 4. For each entry E, calculate d 1 and d 2 where d i is the area increase in covering rectangle of group i when E is added. 5. Find E with maximum d 1 - d 2 and add E to the group whose area will increase the least. Time complexity Algorithm is quadratic in M. Linear in number of dimensions ,R7 as seeds: R7, as seeds: Choose, as seeds seeds R7 R6 Minimum occupancy guarantee may force and to be assigned to
6 Linear Algorithm For each dimension, Choose the pair of entries with the largest separation (highest low value and lowest high value). Normalize by dividing by the width of entire set along that dimension. Choose the two entries (and dimension) with the largest normalized separation as the initial seeds. Randomly, but evenly divide the rest of the entries between the two groups. Algorithm is linear in M (capacity); almost no attempt at optimality. Deletion 1. Find the entry to delete and remove it from the appropriate leaf L. 2. Set N=L and Q =. (Q contains to-be-inserted entries) 3. If N is root, go to step 6. Else, let P be N s parent and E N be the entry in P that points to N. 1. If N has less than m entries, delete E N from P and add contents of N to Q. 2. If N has at least m entries then set the rectangle of E N to tightly enclose N. 4. Set N=P and repeat from step *Reinsert leaf entries from Q. Reinsert non-leaf entries from Q higher up so that all leaves are at the same level. 6. If root has 1 child, make the child the new root Space requirements (2kd + p) bytes per index entry for d dimensions k bytes per dimension p bytes for pointer Same for data entries with spatial extent (kd+p) bytes per data entry for point data Trees tend to be very wide and shallow Performance Tests CENTRAL circuit cell (1057 rectangles) Insertion test performance on last 10% inserts. Search test randomly generated rectangles that retrieve about 5% of the data. Deletion test delete every 10 th entry. Page size varies from 128 bytes to 2K M varies from 6 to
7 Insertion performance With linear-time splitting, inserts spend very little time doing splits. Growth with page size as expected. Increasing m reduces insertion cost because the minimum occupancy requirement gets used earlier in the insertion algorithm. Deletion performance Deletion cost affected by m. For large m: More nodes become under-full (occupancy < m). More reinserts take place. More possible splits. Running time is pretty bad for m = M/ Search performance Space Efficiency Stricter node fill criterion leads to smaller index. Search is relatively insensitive to splitting algorithm. Less I/O with larger pages. More CPU cost with larger pages. Smaller values of m reduce average number of entries per node, so less time is spent on search in the node
8 Conclusions Linear time splitting algorithm is almost as good as the others. Low node-fill requirement reduces spaceutilization but is not significantly worse than stricter node-fill requirements. R-tree can be added to relational databases. Took more than 10 years! The R*-tree: An Efficient and Robust Access Method for Points and Rectangles Norbert Beckmann, Hans-Peter Kriegel Ralf Schneider, Bernhard Seeger R*-tree Optimization on R-tree Minimize area, overlap, and margin (sum of the sides of a rectangle) Insertion at levels above leaf-1, as before at leaf-1 level, choose subtree with minimum overlap overlap(e,node) = sum of area(e entry) for all entry in node only marginally better than R-tree Split strategy M = max capacity, m = min capacity For each dimension, sort M+1 values by the lower value (use upper value to break ties) M=7,m=3 Consider groups containing the first m-1+k and the remaining M+2-m-k entries with k in [1,M-2m+2] Evaluate the area-value, margin-value, and overlap-value for each split point
9 Split strategy Area-value(split) = area(first group) + area(second group) Smaller area reduces access probability of access Margin-value(split) = margin(first group) + margin(second group) Small margin produces better packing and less overlaps Overlap-value(split) = common area of two groups Minimize common search area Choose split axis as the one containing the smallest Margin-value split Along the split axis, choose the splitting point to be the one that gives the minimum overlap-value. Use area-value to resolve ties. Forced reinserts When a split occurs at level k, sort the entries in overflowing node in a descending order based on the distance of their centroid from the node centroid Remove the first p entries and adjust the bounding rectangle of the overflowing node Reinsert the p removed entries (data or index) Empirical value for p = 30% This reduces overlap and leads to a better structure Test Data (F1) Uniform 100,000 rectangles. (F2) Cluster Centers are distributed into 640 clusters of about 1600 objects each. (F3) Parcel decompose unit square into 100,000 disjoint rectangles and increase area of each rectangle by factor of 2.5. (F4) Real-Data 120,576 rectangles from elevation lines from cartography data. (F5) Gaussian Centers follow 2-dimensional independent Gaussian distribution. (F6) Mixed-Uniform 99,000 uniformly distributed small rectangles and 1,000 uniformly distributed large rectangles. Performance Rectangle intersection query All data rectangles intersecting the query rectangle Point enclosure query All data rectangles containing the query point Rectangle enclosure query All data rectangles containing the query rectangle Spatial joins (intersection) 1K page size, M =
10 Typical Performance Data Storage utilization Spatial Join Test files: (SJ1) 1000 random rectangles from (F3) joined with (F4) (SJ2) 7500 random rectangles from (F3) joined with 7,536 rectangles from elevation lines. (SJ3) Self-join of 20,000 random rectangles from (F3) Relative performance Disk accesses Point dataset and range queries Summary of experiments Significant improvement over R-tree No test data for more than two dimensions. R*-tree is robust even for bad data distributions. R*-tree reduces # of splits and is more space efficient than other R-tree variants. R*-tree outperforms all other R-tree variants in page I/O. Problems CPU cost not calculated. Comparison with linear scan performance?
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&
Big Data and Scripting. Part 4: Memory Hierarchies
1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Vector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers
B+ Tree and Hashing B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree Properties Balanced Tree Same height for paths
Analysis of Algorithms I: Binary Search Trees
Analysis of Algorithms I: Binary Search Trees Xi Chen Columbia University Hash table: A data structure that maintains a subset of keys from a universe set U = {0, 1,..., p 1} and supports all three dictionary
Physical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
CSE 326: Data Structures B-Trees and B+ Trees
Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is
BIRCH: An Efficient Data Clustering Method For Very Large Databases
BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.
Indexing Spatio-Temporal archive As a Preprocessing Alsuccession
The VLDB Journal manuscript No. (will be inserted by the editor) Indexing Spatio-temporal Archives Marios Hadjieleftheriou 1, George Kollios 2, Vassilis J. Tsotras 1, Dimitrios Gunopulos 1 1 Computer Science
B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
Cluster Analysis for Optimal Indexing
Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference Cluster Analysis for Optimal Indexing Tim Wylie, Michael A. Schuh, John Sheppard, and Rafal A.
DATABASE DESIGN - 1DL400
DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB Badal K. Kothari 1, Prof. Ashok R. Patel 2 1 Research Scholar, Mewar University, Chittorgadh, Rajasthan, India 2 Department of Computer
Indexing and Retrieval of Historical Aggregate Information about Moving Objects
Indexing and Retrieval of Historical Aggregate Information about Moving Objects Dimitris Papadias, Yufei Tao, Jun Zhang, Nikos Mamoulis, Qiongmao Shen, and Jimeng Sun Department of Computer Science Hong
Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments
Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Yaokai Feng a, Akifumi Makinouchi b a Faculty of Information Science and Electrical Engineering, Kyushu University,
Multi-dimensional index structures Part I: motivation
Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for
QuickDB Yet YetAnother Database Management System?
QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,
File Management. Chapter 12
Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution
PERFORMANCE COMPARISON OF SPATIAL INDEXING STRUCTURES FOR DIFFERENT QUERY TYPES NEELABH PANT. Presented to the Faculty of the Graduate School of
PERFORMANCE COMPARISON OF SPATIAL INDEXING STRUCTURES FOR DIFFERENT QUERY TYPES by NEELABH PANT Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment
From Last Time: Remove (Delete) Operation
CSE 32 Lecture : More on Search Trees Today s Topics: Lazy Operations Run Time Analysis of Binary Search Tree Operations Balanced Search Trees AVL Trees and Rotations Covered in Chapter of the text From
Indexing the Trajectories of Moving Objects in Networks
Indexing the Trajectories of Moving Objects in Networks Victor Teixeira de Almeida Ralf Hartmut Güting Praktische Informatik IV Fernuniversität Hagen, D-5884 Hagen, Germany {victor.almeida, rhg}@fernuni-hagen.de
Data Structures for Moving Objects
Data Structures for Moving Objects Pankaj K. Agarwal Department of Computer Science Duke University Geometric Data Structures S: Set of geometric objects Points, segments, polygons Ask several queries
root node level: internal node edge leaf node CS@VT Data Structures & Algorithms 2000-2009 McQuain
inary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each
Databases and Information Systems 1 Part 3: Storage Structures and Indices
bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -
MATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation
MATHEMATICAL ENGINEERING TECHNICAL REPORTS The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation Shinji IMAHORI, Mutsunori YAGIURA METR 2007 53 September 2007 DEPARTMENT
Efficient Updates for OLAP Range Queries on Flash Memory
Efficient Updates for OLAP Range Queries on Flash Memory Mitzi McCarthy and Zhen He Department of Computer Science and Computer Engineering, La Trobe University, VIC 3086, Australia Email: [email protected];
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin [email protected] 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
GiST. Amol Deshpande. March 8, 2012. University of Maryland, College Park. CMSC724: Access Methods; Indexes; GiST. Amol Deshpande.
CMSC724: ; Indexes; : Generalized University of Maryland, College Park March 8, 2012 Outline : Generalized : Generalized : Why? Most queries have predicates in them Accessing only the needed records key
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Large Databases. [email protected], [email protected]. Abstract. Many indexing approaches for high dimensional data points have evolved into very complex
NB-Tree: An Indexing Structure for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University
EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27
EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27 The Problem Given a set of N objects, do any two intersect? Objects could be lines, rectangles, circles, polygons, or other geometric objects Simple to
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets
Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Dudu Lazarov, Gil David, Amir Averbuch School of Computer Science, Tel-Aviv University Tel-Aviv 69978, Israel Abstract
Data storage Tree indexes
Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating
Binary Search Trees 3/20/14
Binary Search Trees 3/0/4 Presentation for use ith the textbook Data Structures and Algorithms in Java, th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldasser, Wiley, 04 Binary Search Trees 4
Binary Heaps. CSE 373 Data Structures
Binary Heaps CSE Data Structures Readings Chapter Section. Binary Heaps BST implementation of a Priority Queue Worst case (degenerate tree) FindMin, DeleteMin and Insert (k) are all O(n) Best case (completely
Performance Evaluation of Main-Memory R-tree Variants
Performance Evaluation of Main-Memory R-tree Variants Sangyong Hwang 1, Keunjoo Kwon 1, Sang K. Cha 1, Byung S. Lee 2 1. Seoul National University {syhwang, icdi, chask}@kdb.snu.ac.kr 2. University of
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property
CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called
Lecture 1: Data Storage & Index
Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager
CIS 631 Database Management Systems Sample Final Exam
CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files
External Memory Geometric Data Structures
External Memory Geometric Data Structures Lars Arge Department of Computer Science University of Aarhus and Duke University Augues 24, 2005 1 Introduction Many modern applications store and process datasets
Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb
Binary Search Trees Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 27 1 Properties of Binary Search Trees 2 Basic BST operations The worst-case time complexity of BST operations
Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications
Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Jie Gao Department of Computer Science Stanford University Joint work with Li Zhang Systems Research Center Hewlett-Packard
External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
Mining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Overview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan
The R*-tree: An Efficient and Robust Access Method for Points and Rectangles+
The : An Efficient and Robust Access Method for Points and Rectangles+ Norbert Beckmann, Hans-Peter begel Ralf Schneider, Bernhard Seeger Praktuche Informatlk, Umversltaet Bremen, D-2800 Bremen 33, West
Computational Geometry. Lecture 1: Introduction and Convex Hulls
Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,
Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.
Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we
Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R
Binary Search Trees A Generic Tree Nodes in a binary search tree ( B-S-T) are of the form P parent Key A Satellite data L R B C D E F G H I J The B-S-T has a root node which is the only node whose parent
DPTree: A Balanced Tree Based Indexing Framework for Peer-to-Peer Systems
DPTree: A Balanced Tree Based Indexing Framework for Peer-to-Peer Systems Mei Li Wang-Chien Lee Anand Sivasubramaniam Department of Computer Science and Engineering Pennsylvania State University University
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Rotation Operation for Binary Search Trees Idea:
Rotation Operation for Binary Search Trees Idea: Change a few pointers at a particular place in the tree so that one subtree becomes less deep in exchange for another one becoming deeper. A sequence of
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
Colored Range Searching on Internal Memory
Colored Range Searching on Internal Memory Haritha Bellam, Saladi Rahul, and Krishnan Rajan Lab for Spatial Informatics, IIIT-Hyderabad, Hyderabad, India Univerity of Minnesota, Minneapolis, MN, USA Abstract.
Treemaps with bounded aspect ratio
technische universiteit eindhoven Department of Mathematics and Computer Science Master s Thesis Treemaps with bounded aspect ratio by Vincent van der Weele Supervisor dr. B. Speckmann Eindhoven, July
Optimized Data Indexing Algorithms for OLAP Systems
Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015
Data Structures Jaehyun Park CS 97SI Stanford University June 29, 2015 Typical Quarter at Stanford void quarter() { while(true) { // no break :( task x = GetNextTask(tasks); process(x); // new tasks may
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques
Database Systems Session 8 Main Theme Physical Database Design, Query Execution Concepts and Database Programming Techniques Dr. Jean-Claude Franchitti New York University Computer Science Department Courant
Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010
Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010 Student Name: ID: Part 1: Multiple-Choice Questions (17 questions, 1
Fast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik [email protected] Abstract This paper provides an introduction to the design of augmented data structures that offer
Lecture 2 February 12, 2003
6.897: Advanced Data Structures Spring 003 Prof. Erik Demaine Lecture February, 003 Scribe: Jeff Lindy Overview In the last lecture we considered the successor problem for a bounded universe of size u.
Project Group High- performance Flexible File System 2010 / 2011
Project Group High- performance Flexible File System 2010 / 2011 Lecture 1 File Systems André Brinkmann Task Use disk drives to store huge amounts of data Files as logical resources A file can contain
Clustering Via Decision Tree Construction
Clustering Via Decision Tree Construction Bing Liu 1, Yiyuan Xia 2, and Philip S. Yu 3 1 Department of Computer Science, University of Illinois at Chicago, 851 S. Morgan Street, Chicago, IL 60607-7053.
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
Database Design Patterns. Winter 2006-2007 Lecture 24
Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each
A hierarchical multicriteria routing model with traffic splitting for MPLS networks
A hierarchical multicriteria routing model with traffic splitting for MPLS networks João Clímaco, José Craveirinha, Marta Pascoal jclimaco@inesccpt, jcrav@deecucpt, marta@matucpt University of Coimbra
Chapter 8: Structures for Files. Truong Quynh Chi [email protected]. Spring- 2013
Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi [email protected] Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records
