Rtrees. RTrees: A Dynamic Index Structure For Spatial Searching. RTree. Invariants


 Brendan O’Connor’
 3 years ago
 Views:
Transcription
1 RTrees: A Dynamic Index Structure For Spatial Searching A. Guttman Rtrees Generalization of B+trees to higher dimensions Diskbased index structure Occupancy guarantee Multiple search paths Insertions and splits can be complex Accommodates nonpoint data easily 1 2 RTree Balanced (similar to B+ tree) Index node I is an ndimensional rectangle of the form (I 0, I 1,..., I n1 ) where each interval is a range [a,b] Leaf node entry: (I, tuple_id) Nonleaf node entry: (I, child_ptr) M is maximum entries per node. m M/2 is a parameter specifying the minimum number of entries per node. Invariants 1. Every leaf (nonleaf) has between m and M records (children) except for the root. 2. Root has at least two children unless it is a leaf. 3. Every index entry is the smallest rectangle that contains the children. (MBR = Minimum Bounding Rectangle). 4. All leaves appear at the same level
2 Example Example 5 6 Searching Given a search rectangle S Start at root and locate all child nodes which intersect S (via linear search). 2. Search the subtrees of those child nodes. 3. When you get to the leaves, return entries whose rectangles intersect S. Searches may require inspecting several paths. Worst case running time is not so good... Searching for R
3 Insertion I1 I2 I3 Insertion is done at the leaves Where to put new entry with rectangle R? 1. Start at root. 2. Go down the tree by choosing child whose rectangle needs the least enlargement to include R. In case of a tie, choose child with smallest area. 3. If there is room in the correct leaf node, insert it. Otherwise split the node (to be continued...) 4. Adjust the tree If the root was split into nodes N 1 and N 2, create new root with N 1 and N 2 as children. R1 R3 R2 R11 I1 I3 R8 R10 R9 I2 R8 R9 R I3 I1 I2 I3 I3 I1 I2 I3 R1 R8 R9 R10 R1 R8 R9 R10 R2 I1 R3 R11 R8 R10 R9 I2 R2 I1 R3 R11 R8 R10 R9 I
4 I3 I1 I2 I3 I3 I1 I2 I3 R1 R8 R9 R10 R1 R8 R9 R10 R2 R3 R8 R10 R2 R3 R8 R10 I1 R11 R9 I2 I1 R11 R9 I Splitting Nodes Problem: Divide M+1 entries among two nodes so that it is unlikely that the nodes are needlessly examined during a search. Solution: Minimize the probability of accessing the nodes during a query. Exhaustive algorithm. Quadratic algorithm. Linear time algorithm. Exhaustive Search Minimize the sum of probabilities of accessing the two pages by a query requires knowledge of range or number of nearest neighbors Try all possible combinations. Optimal results Bad running time!
5 Quadratic Algorithm 1. Find pair of entries E 1 and E 2 that maximizes area(j)  area(e 1 )  area(e 2 ) where J is covering rectangle. Pick the pair with least affinity, i.e., the pair that wastes maximum space 2. Put E 1 in one group, E 2 in the other. 3. If one group has Mm+1 entries, put the remaining entries into the other group and stop. If all entries have been distributed then stop. 4. For each entry E, calculate d 1 and d 2 where d i is the area increase in covering rectangle of group i when E is added. 5. Find E with maximum d 1  d 2 and add E to the group whose area will increase the least. Time complexity Algorithm is quadratic in M. Linear in number of dimensions ,R7 as seeds: R7, as seeds: Choose, as seeds seeds R7 R6 Minimum occupancy guarantee may force and to be assigned to
6 Linear Algorithm For each dimension, Choose the pair of entries with the largest separation (highest low value and lowest high value). Normalize by dividing by the width of entire set along that dimension. Choose the two entries (and dimension) with the largest normalized separation as the initial seeds. Randomly, but evenly divide the rest of the entries between the two groups. Algorithm is linear in M (capacity); almost no attempt at optimality. Deletion 1. Find the entry to delete and remove it from the appropriate leaf L. 2. Set N=L and Q =. (Q contains tobeinserted entries) 3. If N is root, go to step 6. Else, let P be N s parent and E N be the entry in P that points to N. 1. If N has less than m entries, delete E N from P and add contents of N to Q. 2. If N has at least m entries then set the rectangle of E N to tightly enclose N. 4. Set N=P and repeat from step *Reinsert leaf entries from Q. Reinsert nonleaf entries from Q higher up so that all leaves are at the same level. 6. If root has 1 child, make the child the new root Space requirements (2kd + p) bytes per index entry for d dimensions k bytes per dimension p bytes for pointer Same for data entries with spatial extent (kd+p) bytes per data entry for point data Trees tend to be very wide and shallow Performance Tests CENTRAL circuit cell (1057 rectangles) Insertion test performance on last 10% inserts. Search test randomly generated rectangles that retrieve about 5% of the data. Deletion test delete every 10 th entry. Page size varies from 128 bytes to 2K M varies from 6 to
7 Insertion performance With lineartime splitting, inserts spend very little time doing splits. Growth with page size as expected. Increasing m reduces insertion cost because the minimum occupancy requirement gets used earlier in the insertion algorithm. Deletion performance Deletion cost affected by m. For large m: More nodes become underfull (occupancy < m). More reinserts take place. More possible splits. Running time is pretty bad for m = M/ Search performance Space Efficiency Stricter node fill criterion leads to smaller index. Search is relatively insensitive to splitting algorithm. Less I/O with larger pages. More CPU cost with larger pages. Smaller values of m reduce average number of entries per node, so less time is spent on search in the node
8 Conclusions Linear time splitting algorithm is almost as good as the others. Low nodefill requirement reduces spaceutilization but is not significantly worse than stricter nodefill requirements. Rtree can be added to relational databases. Took more than 10 years! The R*tree: An Efficient and Robust Access Method for Points and Rectangles Norbert Beckmann, HansPeter Kriegel Ralf Schneider, Bernhard Seeger R*tree Optimization on Rtree Minimize area, overlap, and margin (sum of the sides of a rectangle) Insertion at levels above leaf1, as before at leaf1 level, choose subtree with minimum overlap overlap(e,node) = sum of area(e entry) for all entry in node only marginally better than Rtree Split strategy M = max capacity, m = min capacity For each dimension, sort M+1 values by the lower value (use upper value to break ties) M=7,m=3 Consider groups containing the first m1+k and the remaining M+2mk entries with k in [1,M2m+2] Evaluate the areavalue, marginvalue, and overlapvalue for each split point
9 Split strategy Areavalue(split) = area(first group) + area(second group) Smaller area reduces access probability of access Marginvalue(split) = margin(first group) + margin(second group) Small margin produces better packing and less overlaps Overlapvalue(split) = common area of two groups Minimize common search area Choose split axis as the one containing the smallest Marginvalue split Along the split axis, choose the splitting point to be the one that gives the minimum overlapvalue. Use areavalue to resolve ties. Forced reinserts When a split occurs at level k, sort the entries in overflowing node in a descending order based on the distance of their centroid from the node centroid Remove the first p entries and adjust the bounding rectangle of the overflowing node Reinsert the p removed entries (data or index) Empirical value for p = 30% This reduces overlap and leads to a better structure Test Data (F1) Uniform 100,000 rectangles. (F2) Cluster Centers are distributed into 640 clusters of about 1600 objects each. (F3) Parcel decompose unit square into 100,000 disjoint rectangles and increase area of each rectangle by factor of 2.5. (F4) RealData 120,576 rectangles from elevation lines from cartography data. (F5) Gaussian Centers follow 2dimensional independent Gaussian distribution. (F6) MixedUniform 99,000 uniformly distributed small rectangles and 1,000 uniformly distributed large rectangles. Performance Rectangle intersection query All data rectangles intersecting the query rectangle Point enclosure query All data rectangles containing the query point Rectangle enclosure query All data rectangles containing the query rectangle Spatial joins (intersection) 1K page size, M =
10 Typical Performance Data Storage utilization Spatial Join Test files: (SJ1) 1000 random rectangles from (F3) joined with (F4) (SJ2) 7500 random rectangles from (F3) joined with 7,536 rectangles from elevation lines. (SJ3) Selfjoin of 20,000 random rectangles from (F3) Relative performance Disk accesses Point dataset and range queries Summary of experiments Significant improvement over Rtree No test data for more than two dimensions. R*tree is robust even for bad data distributions. R*tree reduces # of splits and is more space efficient than other Rtree variants. R*tree outperforms all other Rtree variants in page I/O. Problems CPU cost not calculated. Comparison with linear scan performance?
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing GridFiles Kdtrees Ulf Leser: Data
More informationSurvey On: Nearest Neighbour Search With Keywords In Spatial Databases
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&
More informationQuery Processing, optimization, and indexing techniques
Query Processing, optimization, and indexing techniques What s s this tutorial about? From here: SELECT C.name AS Course, count(s.students) AS Cnt FROM courses C, subscription S WHERE C.lecturer = Calders
More informationBig Data and Scripting. Part 4: Memory Hierarchies
1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)
More informationChapter 7. Multiway Trees. Data Structures and Algorithms in Java
Chapter 7 Multiway Trees Data Structures and Algorithms in Java Objectives Discuss the following topics: The Family of BTrees Tries Case Study: Spell Checker Data Structures and Algorithms in Java 2 Multiway
More informationMultiway Search Tree (MST)
Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n: Each node has n or fewer subtrees S1 S2. Sm, m n Each node has n1 or fewer keys K1 Κ2 Κm1 : m1 keys in ascending
More informationMultiWay Search Trees (B Trees)
MultiWay Search Trees (B Trees) Multiway Search Trees An mway search tree is a tree in which, for some integer m called the order of the tree, each node has at most m children. If n
More informationB+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers
B+ Tree and Hashing B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree Properties Balanced Tree Same height for paths
More informationVector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototypebased clustering Densitybased clustering Graphbased
More informationPrevious Lectures. BTrees. External storage. Two types of memory. Btrees. Main principles
BTrees Algorithms and data structures for external memory as opposed to the main memory BTrees Previous Lectures Height balanced binary search trees: AVL trees, redblack trees. Multiway search trees:
More informationSpatial Partitioning and Indexing Responsible persons: Claudia Dolci, Dante Salvini, Michael Schrattner, Robert Weibel
Geographic Information Technology Training Alliance (GITTA) presents: Spatial Partitioning and Indexing Responsible persons: Claudia Dolci, Dante Salvini, Michael Schrattner, Robert Weibel Content 1.
More informationProblem. Indexing with Btrees. Indexing. Primary Key Indexing. Btrees. Btrees: Example. primary key indexing
Problem Given a large collection of records, Indexing with Btrees find similar/interesting things, i.e., allow fast, approximate queries Anastassia Ailamaki http://www.cs.cmu.edu/~natassa 2 Indexing Primary
More informationTreeBased Indexes for Image Data
TreeBased Indexes for Image Data Leonard Brown Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 73019 lbrown@cs.ou.edu gruenwal@cs.ou.edu Abstract As in conventional DataBase
More informationCSE 326: Data Structures BTrees and B+ Trees
Announcements (4//08) CSE 26: Data Structures BTrees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:5: Thursday in Jaech Gallery (6 th floor of CSE building) This is
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database  appropriate level for users to focus on  user independence from implementation details Performance  other major factor
More informationSuppose you are accessing elements of an array: ... or suppose you are dereferencing pointers:
CSE 100: BTREE Memory accesses Suppose you are accessing elements of an array: if ( a[i] < a[j] ) {... or suppose you are dereferencing pointers: temp>next>next = elem>prev>prev;... or in general
More informationAnalysis of Algorithms I: Binary Search Trees
Analysis of Algorithms I: Binary Search Trees Xi Chen Columbia University Hash table: A data structure that maintains a subset of keys from a universe set U = {0, 1,..., p 1} and supports all three dictionary
More informationChapter 18 Indexing Structures for Files. Indexes as Access Paths
Chapter 18 Indexing Structures for Files Indexes as Access Paths A singlelevel index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is usually specified
More informationIndexing Spatiotemporal Archives
The VLDB Journal manuscript No. (will be inserted by the editor) Indexing Spatiotemporal Archives Marios Hadjieleftheriou 1, George Kollios 2, Vassilis J. Tsotras 1, Dimitrios Gunopulos 1 1 Computer Science
More informationBIRCH: An Efficient Data Clustering Method For Very Large Databases
BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.
More informationMODULE 15 Clustering Large Datasets LESSON 34
MODULE 15 Clustering Large Datasets LESSON 34 Incremental Clustering Keywords: Single Database Scan, Leader, BIRCH, Tree 1 Clustering Large Datasets Pattern matrix It is convenient to view the input data
More informationBTrees. Algorithms and data structures for external memory as opposed to the main memory BTrees. B trees
BTrees Algorithms and data structures for external memory as opposed to the main memory BTrees Previous Lectures Height balanced binary search trees: AVL trees, redblack trees. Multiway search trees:
More informationEfficient visual search of local features. Cordelia Schmid
Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images
More informationDATABASE DESIGN  1DL400
DATABASE DESIGN  1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information
More informationTHE concept of Big Data refers to systems conveying
EDIC RESEARCH PROPOSAL 1 High Dimensional Nearest Neighbors Techniques for Data Cleaning AncaElena Alexandrescu I&C, EPFL Abstract Organisations from all domains have been searching for increasingly more
More informationCluster Analysis for Optimal Indexing
Proceedings of the TwentySixth International Florida Artificial Intelligence Research Society Conference Cluster Analysis for Optimal Indexing Tim Wylie, Michael A. Schuh, John Sheppard, and Rafal A.
More informationBalanced search trees
Lecture 8 Balanced search trees 8.1 Overview In this lecture we discuss search trees as a method for storing data in a way that supports fast insert, lookup, and delete operations. (Data structures handling
More informationIndexing and Retrieval of Historical Aggregate Information about Moving Objects
Indexing and Retrieval of Historical Aggregate Information about Moving Objects Dimitris Papadias, Yufei Tao, Jun Zhang, Nikos Mamoulis, Qiongmao Shen, and Jimeng Sun Department of Computer Science Hong
More informationMultidimensional index structures Part I: motivation
Multidimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms ref. Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Outline Prototypebased Fuzzy cmeans Mixture Model Clustering Densitybased
More informationI/OEfficient Spatial Data Structures for Range Queries
I/OEfficient Spatial Data Structures for Range Queries Lars Arge Kasper Green Larsen MADALGO, Department of Computer Science, Aarhus University, Denmark Email: large@madalgo.au.dk,larsen@madalgo.au.dk
More informationCUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB Badal K. Kothari 1, Prof. Ashok R. Patel 2 1 Research Scholar, Mewar University, Chittorgadh, Rajasthan, India 2 Department of Computer
More informationQuickDB Yet YetAnother Database Management System?
QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,
More informationRichard G. Newell, Mark Easterfield & David G. Theriault Smallworld Systems Ltd, 89 Bridge Street, Cambridge, England CB2 1UA
INTEGRATION OF SPATIAL OBJECTS IN A GIS Richard G. Newell, Mark Easterfield & David G. Theriault Smallworld Systems Ltd, 89 Bridge Street, Cambridge, England CB2 1UA ABSTRACT A GIS is distinguished from
More informationAnnouncements. CSE332: Data Abstractions. Lecture 9: B Trees. Today. Our goal. Mary Search Tree. Mary Search Tree. Ruth Anderson Winter 2011
Announcements CSE2: Data Abstractions Project 2 posted! Partner selection due by 11pm Tues 1/25 at the latest. Homework due Friday Jan 28 st at the BEGINNING of lecture Lecture 9: B Trees Ruth Anderson
More informationAg + tree: an Index Structure for Rangeaggregation Queries in Data Warehouse Environments
Ag + tree: an Index Structure for Rangeaggregation Queries in Data Warehouse Environments Yaokai Feng a, Akifumi Makinouchi b a Faculty of Information Science and Electrical Engineering, Kyushu University,
More informationRedundant Bit Vectors for the Audio Fingerprinting Server. John Platt Jonathan Goldstein Chris Burges
Redundant Bit Vectors for the Audio Fingerprinting Server John Platt Jonathan Goldstein Chris Burges Structure of Talk. Problem Statement 2. Problems with Existing Techniques 3. Bit Vectors 4. Partitioning
More informationPERFORMANCE COMPARISON OF SPATIAL INDEXING STRUCTURES FOR DIFFERENT QUERY TYPES NEELABH PANT. Presented to the Faculty of the Graduate School of
PERFORMANCE COMPARISON OF SPATIAL INDEXING STRUCTURES FOR DIFFERENT QUERY TYPES by NEELABH PANT Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment
More informationFile Management. Chapter 12
Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution
More informationIndexing the Trajectories of Moving Objects in Networks
Indexing the Trajectories of Moving Objects in Networks Victor Teixeira de Almeida Ralf Hartmut Güting Praktische Informatik IV Fernuniversität Hagen, D5884 Hagen, Germany {victor.almeida, rhg}@fernunihagen.de
More information7. Indexing. Contents: SingleLevel Ordered Indexes MultiLevel Indexes B + Tree based Indexes Index Definition in SQL.
ECS165A WQ 11 123 Contents: SingleLevel Ordered Indexes MultiLevel Indexes B + Tree based Indexes Index Definition in SQL 7. Indexing Basic Concepts Indexing mechanisms are used to optimize certain
More informationDISTANCEBASED INDEXING FOR HIGHDIMENSIONAL METRIC SPACES *
DISTANCEBASED INDEXING FOR HIGHDIMENSIONAL METRIC SPACES * Tolga Bozkaya Department of Computer Engineering & Science Case Western Reserve University email: bozkaya@alpha.ces.cwru.edu Meral Ozsoyoglu
More informationFrom Last Time: Remove (Delete) Operation
CSE 32 Lecture : More on Search Trees Today s Topics: Lazy Operations Run Time Analysis of Binary Search Tree Operations Balanced Search Trees AVL Trees and Rotations Covered in Chapter of the text From
More informationBINARY SEARCH TREE PERFORMANCE
BINARY SEARCH TREE PERFORMANCE Operation Best Time Average Time Worst Time (on a tree of n nodes) Find Insert Delete O(lg n)?? O(lg n)?? O(n) Fastest Running Time The find, insert and delete algorithms
More informationAn Adaptive Index Structure for HighDimensional Similarity Search
An Adaptive Index Structure for HighDimensional Similarity Search Abstract A practical method for creating a high dimensional index structure that adapts to the data distribution and scales well with
More informationChapter 23: Advanced Data Types and New Applications. Overview
Chapter 23: Advanced Data Types and New Applications Copyright: Silberschatz, Korth and Sudarshan 1 Overview! Temporal Data! Spatial and Geographic Databases! Multimedia Databases! Mobility and Personal
More informationChapter 7. Indexes. Objectives. Table of Contents
Chapter 7. Indexes Table of Contents Objectives... 1 Introduction... 2 Context... 2 Review Questions... 3 Singlelevel Ordered Indexes... 4 Primary Indexes... 4 Clustering Indexes... 8 Secondary Indexes...
More informationEffective Complex Data Retrieval Mechanism for Mobile Applications
, 2325 October, 2013, San Francisco, USA Effective Complex Data Retrieval Mechanism for Mobile Applications Haeng Kon Kim Abstract While mobile devices own limited storages and low computational resources,
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationGiST. Amol Deshpande. March 8, 2012. University of Maryland, College Park. CMSC724: Access Methods; Indexes; GiST. Amol Deshpande.
CMSC724: ; Indexes; : Generalized University of Maryland, College Park March 8, 2012 Outline : Generalized : Generalized : Why? Most queries have predicates in them Accessing only the needed records key
More informationDatabases and Information Systems 1 Part 3: Storage Structures and Indices
bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents:  database buffer 
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical microclustering algorithm ClusteringBased SVM (CBSVM) Experimental
More informationData Structures for Moving Objects
Data Structures for Moving Objects Pankaj K. Agarwal Department of Computer Science Duke University Geometric Data Structures S: Set of geometric objects Points, segments, polygons Ask several queries
More informationA. V. Gerbessiotis CS Spring 2014 PS 3 Mar 24, 2014 No points
A. V. Gerbessiotis CS 610102 Spring 2014 PS 3 Mar 24, 2014 No points Problem 1. Suppose that we insert n keys into a hash table of size m using open addressing and uniform hashing. Let p(n, m) be the
More informationSPEEDING UP BULKLOADING OF QUADTREES
l0 SPEEDING UP BULKLOADING OF QUADTREES GÍSLI R. HJALTASON HANAN SAMET YORAM J. SUSSMANN COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More information! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions
Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions Basic Steps in Query
More informationGeneralizing Database Access Methods
Generalizing Database Access Methods by Ming Zhou A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Master of Mathematics in Computer Science Waterloo,
More informationDatabase 2 Lecture II. Alessandro Artale
Free University of Bolzano Database 2. Lecture II, 2003/2004 A.Artale (1) Database 2 Lecture II Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 artale@inf.unibz.it http://www.inf.unibz.it/
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationData Management for Data Science
Data Management for Data Science Database Management Systems: Access file manager and query evaluation Maurizio Lenzerini, Riccardo Rosati Dipartimento di Ingegneria informatica automatica e gestionale
More information15.1 Introduction The Diskbased Environment Btree Definition Btree Query Btree Insertion Btree Deletion
15 B Trees Donghui Zhang Northeastern University 15.1 Introduction............................................ 151 15.2 The Diskbased Environment........................ 152 15.3 The Btree..............................................
More informationLarge Databases. mjf@inescid.pt, jorgej@acm.org. Abstract. Many indexing approaches for high dimensional data points have evolved into very complex
NBTree: An Indexing Structure for ContentBased Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESCID/IST/Technical University
More informationMATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Bestfit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation
MATHEMATICAL ENGINEERING TECHNICAL REPORTS The Bestfit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation Shinji IMAHORI, Mutsunori YAGIURA METR 2007 53 September 2007 DEPARTMENT
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationSmartSample: An Efficient Algorithm for Clustering Large HighDimensional Datasets
SmartSample: An Efficient Algorithm for Clustering Large HighDimensional Datasets Dudu Lazarov, Gil David, Amir Averbuch School of Computer Science, TelAviv University TelAviv 69978, Israel Abstract
More informationEfficient Updates for OLAP Range Queries on Flash Memory
Efficient Updates for OLAP Range Queries on Flash Memory Mitzi McCarthy and Zhen He Department of Computer Science and Computer Engineering, La Trobe University, VIC 3086, Australia Email: m.mccarthy@latrobe.edu.au;
More informationICS 434 Advanced Database Systems
ICS 434 Advanced Database Systems Dr. Abdallah AlSukairi sukairi@kfupm.edu.sa Second Semester 20032004 (032) King Fahd University of Petroleum & Minerals Information & Computer Science Department Outline
More informationData storage Tree indexes
Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating
More informationroot node level: internal node edge leaf node CS@VT Data Structures & Algorithms 20002009 McQuain
inary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each
More informationMIDAS: MultiAttribute Indexing for Distributed Architecture Systems
MIDAS: MultiAttribute Indexing for Distributed Architecture Systems George Tsatsanifos (NTUA) Dimitris Sacharidis (R.C. Athena ) Timos Sellis (NTUA, R.C. Athena ) 12 th International Symposium on Spatial
More informationChapter 4 Index Structures
Chapter 4 Index Structures Having seen the options available for representing records, we must now consider how whole relations, or the extents of classes, are represented. It is not sufficient 4.1. INDEXES
More informationThe DCTree: A Fully Dynamic Index Structure for Data Warehouses
Published in the Proceedings of 16th International Conference on Data Engineering (ICDE 2) The DCTree: A Fully Dynamic Index Structure for Data Warehouses Martin Ester, Jörn Kohlhammer, HansPeter Kriegel
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationA Storage and Access Architecture for Efficient Query Processing in Spatial Database Systems
A Storage and Access Architecture for Efficient Query Processing in Spatial Database Systems Thomas Brinkhoff, Holger Horn, HansPeter Kriegel, Ralf Schneider Institute for Computer Science, University
More informationEnterprise Miner  Decision tree 1
Enterprise Miner  Decision tree 1 ECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Decision Tree I. Tree Node Setting Tree Node Defaults  define default options that you commonly use
More informationEE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27
EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27 The Problem Given a set of N objects, do any two intersect? Objects could be lines, rectangles, circles, polygons, or other geometric objects Simple to
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationCSCI Trees. Mark Redekopp David Kempe
1 CSCI 104 23 Trees Mark Redekopp David Kempe 2 Properties, Insertion and Removal BINARY SEARCH TREES 3 Binary Search Tree Binary search tree = binary tree where all nodes meet the property that: All
More informationThe DCtree: A Fully Dynamic Index Structure for Data Warehouses
The DCtree: A Fully Dynamic Index Structure for Data Warehouses Martin Ester, Jörn Kohlhammer, HansPeter Kriegel Institute for Computer Science, University of Munich Oettingenstr. 67, D80538 Munich,
More informationProviding Diversity in KNearest Neighbor Query Results
Providing Diversity in KNearest Neighbor Query Results Anoop Jain, Parag Sarda, and Jayant R. Haritsa Database Systems Lab, SERC/CSA Indian Institute of Science, Bangalore 560012, INDIA. Abstract. Given
More informationOperating Systems: Internals and Design Principles. Chapter 12 File Management Seventh Edition By William Stallings
Operating Systems: Internals and Design Principles Chapter 12 File Management Seventh Edition By William Stallings Operating Systems: Internals and Design Principles If there is one singular characteristic
More information(20 pts) Exercise on Page 546 in Elmasri and Navathe.
CS 440 Assignment #4 Solution (total = 150 pts., due on March 4 th, Thursday 11:30pm) (20 pts) Exercise 14.15 on Page 546 in Elmasri and Navathe. A PARTS file with Part# as the key field includes records
More informationWellSeparated Pair Decomposition for the Unitdisk Graph Metric and its Applications
WellSeparated Pair Decomposition for the Unitdisk Graph Metric and its Applications Jie Gao Department of Computer Science Stanford University Joint work with Li Zhang Systems Research Center HewlettPackard
More informationISSUES IN SPATIAL DATABASES AND GEOGRAPHIC INFORMATION SYSTEMS (GIS) HANAN SAMET
ISSUES IN SPATIAL DATABASES AND GEOGRAPHIC INFORMATION SYSTEMS (GIS) HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY OF
More informationLecture 1: Data Storage & Index
Lecture 1: Data Storage & Index R&G Chapter 811 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager
More informationChapter 4: NonParametric Classification
Chapter 4: NonParametric Classification Introduction Density Estimation Parzen Windows KnNearest Neighbor Density Estimation KNearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination
More informationCS188 Spring 2011 Section 3: Game Trees
CS188 Spring 2011 Section 3: Game Trees 1 WarmUp: ColumnRow You have a 3x3 matrix of values like the one below. In a somewhat boring game, player A first selects a row, and then player B selects a column.
More informationRecord Storage, File Organization, and Indexes
Record Storage, File Organization, and Indexes ISM6217  Advanced Database Updated October 2005 1 Physical Database Design Phase! Inputs into the Physical Design Phase " Logical (implementation) model
More informationLaboratory Module 8 B Trees
Purpose: understand the notion of B trees to build, in C, a B tree 1 23 Trees 1.1 General Presentation Laboratory Module 8 B Trees When working with large sets of data, it is often not possible or desirable
More informationIndexing Method for Multidimensional Vector Data
DOI: 10.2298/CSIS120702022T Indexing Method for Multidimensional Vector Data Justin Terry and Bela Stantic Institute for Integrated and Intelligent Systems Griffith University, Queensland 4222, Australia
More informationExternal Sorting. Why Sort? 2Way Sort: Requires 3 Buffers. Chapter 13
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
More informationMultidimensional Indexes
Chapter 5 Multidimensional Indexes All the indox structures discussed so far are one dimensional] that is, they assume a single search key, and they retrieve records that match a given searchkey value.
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationSymbol Tables. IE 496 Lecture 13
Symbol Tables IE 496 Lecture 13 Reading for This Lecture Horowitz and Sahni, Chapter 2 Symbol Tables and Dictionaries A symbol table is a data structure for storing a list of items, each with a key and
More informationPerformance Evaluation of MainMemory Rtree Variants
Performance Evaluation of MainMemory Rtree Variants Sangyong Hwang 1, Keunjoo Kwon 1, Sang K. Cha 1, Byung S. Lee 2 1. Seoul National University {syhwang, icdi, chask}@kdb.snu.ac.kr 2. University of
More informationBinary Search Trees 3/20/14
Binary Search Trees 3/0/4 Presentation for use ith the textbook Data Structures and Algorithms in Java, th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldasser, Wiley, 04 Binary Search Trees 4
More information