The Priority RTree: A Practically Efficient and WorstCase Optimal RTree


 Gladys Cummings
 3 years ago
 Views:
Transcription
1 The Priority RTree: A Practically Efficient and WorstCase Otimal RTree Lars Arge Deartment of Comuter Science Duke University, ox Durham, NC USA Mark de erg Deartment of Comuter Science TU Eindhoven, P.O.ox M Eindhoven The Netherlands Herman J. Haverkort Institute of Information and Comuting Sciences Utrecht University, PO ox T Utrecht, The Netherlands Ke Yi Deartment of Comuter Science Duke University, ox Durham, NC , USA ASTRACT We resent the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/) 1 1/d + T/) I/Os, where N is the number of d dimensional (hyer) rectangles stored in the Rtree, is the disk block size, and T is the outut size. This is rovably asymtotically otimal and significantly better than other R tree variants, where a query may visit all N/ leaves in the tree even when T = 0. We also resent an extensive exerimental study of the ractical erformance of the PRtree using both reallife and synthetic data. This study shows that the PRtree erforms similar to the best known Rtree variants on reallife and relatively nicely distributed data, but outerforms them significantly on more extreme data. 1. INTRODUCTION Satial data naturally arise in numerous alications, including geograhical information systems, comuteraided design, comuter vision and robotics. Therefore satial database systems designed to store, manage, and maniulate satial data have received considerable attention over the years. Since these databases often involve massive datasets, disk based index structures for satial data have been researched extensively see e.g. the survey by Gaede and Gün Suorted in art by the National Science Foundation through RI grant EIA , CAREER grant CCR , ITR grant EIA , and U.S. Germany Cooerative Research Program grant INT Suorted by the Netherlands Organization for Scientific Research (NWO). Permission to make digital or hard coies of all or art of this work for ersonal or classroom use is granted without fee rovided that coies are not made or distributed for rofit or commercial advantage, and that coies bear this notice and the full citation on the first age. To coy otherwise, to reublish, to ost on servers or to redistribute to lists, requires rior secific ermission and/or a fee. SIGMOD 2004 June 1318, 2004, Paris, France. Coyright 2004 ACM /04/06... $5.00. ther [11]. Esecially the Rtree [13] and its numerous variants (see e.g. the recent survey by Manolooulos et al. [19]) have emerged as ractically efficient indexing methods. In this aer we resent the Priority Rtree, or PRtree, which is the first Rtree variant that is not only ractically efficient but also rovably asymtotically otimal. 1.1 ackground and revious results Since objects stored in a satial database can be rather comlex they are often aroximated by simler objects, and satial indexes are then built on these aroximations. The most commonly used aroximation is the minimal bounding box: the smallest axisarallel (hyer) rectangle that contains the object. The Rtree, originally roosed by Guttman [13], is an index for such rectangles. It is a heightbalanced multiway tree similar to a tree [5, 9], where each node (excet for the root) has degree Θ(). Each leaf contains Θ() data rectangles (each ossibly with a ointer to the original data) and all leaves are on the same level of the tree; each internal node v contains ointers to its Θ() children, as well as for each child a minimal bounding box covering all rectangles in the leaves of the subtree rooted in that child. If is the number of rectangles that fits in a disk block, an Rtree on N rectangles occuies Θ(N/) disk blocks and has height Θ(log N). Many tyes of queries can be answered efficiently using an Rtree, including the common query called a window query: Given a query rectangle Q, retrieve all rectangles that intersect Q. To answer such a query we simly start at the root of the Rtree and recursively visit all nodes with minimal bounding boxes intersecting Q; when encountering a leaf l we reort all data rectangles in l intersecting Q. Guttman gave several algorithms for udating an Rtree in O(log N) I/Os using treelike algorithms [13]. Since there is no unique Rtree for a given dataset, and because the window query erformance intuitively deends on the amount of overla between minimal bounding boxes in the nodes of the tree, it is natural to try to minimize bounding box overla during udates. This has led to the develoment of many heuristic udate algorithms; see for examle [6, 16,
2 23] or refer to the surveys in [11, 19]. Several secialized algorithms for bulkloading an Rtree have also been develoed [7, 10, 12, 15, 18, 22]. Most of these algorithms use N O( N log M/ ) I/Os (the number of I/Os needed to sort N elements), where M is the number of rectangles that fits in main memory, which is much less than the O(N log N) I/Os needed to build the index by reeated insertion. Furthermore, they tyically roduce Rtrees with better sace utilization and query erformance than Rtrees built using reeated insertion. For examle, while exerimental results have shown that the average sace utilization of dynamically maintained Rtrees is between 50% and 70% [6], most bulkloading algorithms are caable of obtaining over 95% sace utilization. After bulkloading an Rtree it can of course be udated using the standard Rtree udating algorithms. However, in that case its query efficiency and sace utilization may degenerate over time. One common class of Rtree bulkloading algorithms work by sorting the rectangles according to some global onedimensional criterion, lacing them in the leaves in that order, and then building the rest of the index bottomu levelbylevel [10, 15, 18]. In two dimensions, the socalled acked Hilbert Rtree of Kamel and Faloutsos [15], which sorts the rectangles according to the Hilbert values of their centers, has been shown to be esecially queryefficient in ractice. The Hilbert value of a oint is the length of the fractal Hilbert sacefilling curve from the origin to. The Hilbert curve is very good at clustering satially close rectangles together, leading to a good index. A variant of the acked Hilbert Rtree, which also takes the extent of the rectangles into account (rather than just the center), is the fourdimensional Hilbert Rtree [15]; in this structure each rectangle ((x min, y min), (x max, y max)) is first maed to the fourdimensional oint (x min, y min, x max, y max) and then the rectangles are sorted by the ositions of these oints on the fourdimensional Hilbert curve. Exerimentally the fourdimensional Hilbert Rtree has been shown to behave slightly worse than the acked Hilbert Rtree for nicely distributed realistic data [15]. However, intuitively, it is less vulnerable to more extreme datasets because it also takes the extent of the inut rectangles into account. Algorithms that bulkload Rtrees in a todown manner have also been develoed. These algorithms work by recursively trying to find a good artition of the data [7, 12]. The socalled Todown Greedy Slit (TGS) algorithm of García, Lóez and Leutenegger [12] has been shown to result in esecially queryefficient Rtrees (TGS Rtrees). To build the root of (a subtree of) an Rtree on a given set of rectangles, this algorithm reeatedly artitions the rectangles into two sets, until they are divided into subsets of (aroximately) equal size. Each subset s bounding box is stored in the root, and subtrees are constructed recursively on each of the subsets. Each of the binary artitions takes a set of rectangles and slits it into two subsets based on one of several onedimensional orderings; in two dimensions, the orderings considered are those by x min, y min, x max and y max. For each such ordering, the algorithm calculates, for each of O() ossible artitioning ossibilities, the sum of the areas of the bounding boxes of the two subsets that would result from the artition. Then it alies the binary artition that minimizes that sum. 1 1 García et al. describe several variants of the todown greedy method. They found the one described here to be While the TGS Rtree has been shown to have slightly better query erformance than other Rtree variants, the construction algorithm uses many more I/Os since it needs to scan all the rectangles in order to make a binary artition. In fact, in the worst case the algorithm may take O(N log N) I/Os. However, in ractice, the fact that each artition decision is binary effectively means that the algorithm uses O( N log 2 N) I/Os. While much work has been done on evaluating the ractical query erformance of the Rtree variants mentioned above, very little is known about their theoretical worstcase erformance. Most theoretical work on Rtrees is concerned with estimating the exected cost of queries under assumtions such as uniform distribution of the inut and/or the queries, or assuming that the inut are oints rather than rectangles. See the recent survey by Manolooulos et al. [19]. The first bulkloading algorithm with a nontrivial guarantee on the resulting worstcase query erformance was given only recently by Agarwal et al. [2]. In d dimensions their algorithm constructs an Rtree that answers a window query in O((N/) 1 1/d + T log N) I/Os, where T is the number of reorted rectangles. However, this still leaves a ga to the Ω((N/) 1 1/d + T/) lower bound on the number of I/Os needed to answer a window query [2, 17]. If the inut consists of oints rather than rectangles, then worstcase otimal query erformance can be achieved with e.g. a kdtree [21] or an Otree [17]. Unfortunately, it seems hard to modify these structures to work for rectangles. Finally, Agarwal et al. [2], as well as Haverkort et al. [14], also develoed a number of Rtrees that have good worstcase query erformance under certain conditions on the inut. 1.2 Our results In Section 2 we resent a new Rtree variant, which we call a Priority Rtree or PRtree for short. We call our structure the Priority Rtree because our bulkloading algorithm utilizes socalled riority rectangles in a way similar to the recent structure by Agarwal et al. [2]. Window queries can be answered in O((N/) 1 1/d + T/) I/Os on a PRtree, and the index is thus the first Rtree variant that answers queries with an asymtotically otimal number of I/Os in the worst case. To contrast this to revious Rtree bulkloading algorithms, we also construct a set of rectangles and a query with zero outut, such that all Θ(N/) leaves of a acked Hilbert Rtree, a fourdimensional Hilbert Rtree, or a TGS Rtree need to be visited to answer the query. We also show how to bulkload the PRtree efficiently, using only O( N log M/ N ) I/Os. After bulkloading, a PRtree can be udated in O(log N) I/Os using the standard R tree udating algorithms, but without maintaining its query efficiency. Alternatively, the external logarithmic method [4, 20] can be used to develo a structure that suorts inser N tions and deletions in O(log + 1 (log N M M/ )(log N 2 )) M N and O(log ) I/Os amortized, resectively, while maintaining the otimal query erformance. M In Section 3 we resent an extensive exerimental study of the ractical erformance of the PRtree using both reallife and synthetic data. We comare the erformance of our the most efficient in ractice [12]. In order to achieve close to 100% sace utilization, the size of the subsets created is actually rounded u to the nearest ower of (excet for one remainder set). As a result, one node on each level, including the root, may have less than children.
3 index on twodimensional rectangles to the acked Hilbert Rtree, the fourdimensional Hilbert Rtree, and the TGS Rtree. Overall, our exeriments show that all these R trees answer queries in more or less the same number of I/Os on relatively square and uniformly distributed rectangles. However, on more extreme data large rectangles, rectangles with high asect ratios, or nonuniformly distributed rectangles the PRtree (and sometimes also the fourdimensional Hilbert Rtree) outerforms the others significantly. On a secial worstcase dataset the PRtree outerforms all of them by well over an order of magnitude. 2. THE PRIORITY RTREE In this section we describe the PRtree. For simlicity, we first describe a twodimensional seudoprtree in Section 2.1. The seudoprtree answers window queries efficiently but is not a real Rtree, since it does not have all leaves on the same level. In Section 2.2 we show how to obtain a real twodimensional PRtree from the seudoprtree, and in Section 2.3 we discuss how to extend the PRtree to d dimensions. Finally, in Section 2.4 we show that a query on the acked Hilbert Rtree, the fourdimensional Hilbert Rtree, as well as the TGS Rtree can be forced to visit all leaves even if T = Twodimensional seudoprtrees In this section we describe the twodimensional seudo PRtree. Like an Rtree, a seudoprtree has the inut rectangles in the leaves and each internal node ν contains a minimal bounding box for each of its children ν c. However, unlike an Rtree, not all the leaves are on the same level of the tree and internal nodes only have degree six (rather than Θ()). The basic idea in the seudoprtree is (similar to the fourdimensional Hilbert Rtree) to view an inut rectangle ((x min, y min), (x max, y max)) as a fourdimensional oint (x min, y min, x max, y max). The seudoprtree is then basically just a kdtree on the N oints corresonding to the N inut rectangles, excet that four extra leaves are added below each internal node. Intuitively, these socalled riority leaves contain the extreme oints (rectangles) in each of the four dimensions. Note that the fourdimensional kdtree can easily be maed back to an Rtreelike structure, simly by relacing the slit value in each kdtree node ν with the minimal bounding box of the inut rectangles stored in the subtree rooted in ν. The idea of using riority leaves was introduced in a recent structure by Agarwal et al. [2], they used riority leaves of size one rather than. In section below we give a recise definition of the seudoprtree, and in section Section we show that it can be used to answer a window query in O( N/ + T/) I/Os. In Section we describe how to construct the structure I/Oefficiently The Structure Let S = {R 1,..., R N} be a set of N rectangles in the lane and assume for simlicity that no two of the coordinates defining the rectangles are equal. We define Ri = (x min(r i), y min(r i), x max(r i), y max(r i)) to be the maing of R i = ((x min(r i), y min(r i)), (x max(r i), y max(r i))) to a oint in four dimensions, and define S to be the N oints corresonding to S. A seudoprtree T S on S is defined recursively: If S ν ν xmin ν ymax ν x min ν ν ymin slit on x min S < S > ν xmax ν ymax ν xmax T S< T S> ν y min Figure 1: The construction of an internal node in a seudoprtree. contains less than rectangles, T S consists of a single leaf; otherwise, T S consists of a node ν with six children, namely four riority leaves and two recursive seudoprtrees. For each child ν c, we let ν store the minimal bounding box of all inut rectangles stored in the subtree rooted in ν c. The node ν and the riority leaves below it are constructed as follows: The first riority leaf ν x min contains the rectangles in S with minimal x mincoordinates, the second ν y min the rectangles among the remaining rectangles with minimal y mincoordinates, the third ν xmax the rectangles among the remaining rectangles with maximal x maxcoordinates, and finally the fourth ν ymax the rectangles among the remaining rectangles with maximal y maxcoordinates. Thus the riority leaves contain the extreme rectangles in S, namely the ones with leftmost left edges, bottommost bottom edges, rightmost right edges, and tomost to edges. 2 After constructing the riority leaves, we divide the set S r of 2 S may not contain enough rectangles to ut rectangles in each of the four riority leaves. In that case, we may assume that we can still ut at least /4 in each of them, since otherwise we could just construct a single leaf.
4 remaining rectangles (if any) into two subsets, S < and S >, of aroximately the same size and recursively construct seudoprtrees T S< and T S>. The division is erformed using the x min, y min, x max, or y maxcoordinate in a roundrobin fashion, as if we were building a fourdimensional kdtree on Sr, that is, when constructing the root of T S we divide based on the x minvalues, the next level of recursion based on the y minvalues, then based on the x maxvalues, on the y maxvalues, on the x minvalues, and so on. Refer to Figure 1 for an examle. Note that dividing according to, say, x min corresonds to dividing based on a vertical line l such that half of the rectangles in S r have their left edge to the left of l and half of them have their left edge to the right of l. We store each node or leaf of T S in O(1) disk blocks, and since at least four out of every six leaves contain Θ() rectangles we obtain the following (in Section we discuss how to guarantee that almost every leaf is full). Lemma 1. A seudoprtree on a set of N rectangles in the lane occuies O(N/) disk blocks Query comlexity We answer a window query Q on a seudoprtree exactly as on an Rtree by recursively visiting all nodes with minimal bounding boxes intersecting Q. However, unlike for known Rtree variants, for the seudoprtree we can rove a nontrivial (in fact, otimal) bound on the number of I/Os erformed by this rocedure. Lemma 2. A window query on a seudoprtree on N rectangles in the lane uses O( N/ + T/) I/Os in the worst case. Proof. Let T S be a seudoprtree on a set S of N rectangles in the lane. To rove the query bound, we bound the number of nodes in T S that are kdnodes, i.e. not riority leaves, and are visited in order to answer a query with a rectangular range Q; the total number of leaves visited is at most a factor of four larger. We first note that O(T/) is a bound on the number of nodes ν visited where all rectangles in at least one of the riority leaves below ν s arent are reorted. Thus we just need to bound the number of visited kdnodes where this is not the case. Let µ be the arent of a node ν such that none of the riority leaves of µ are reorted comletely, that is, each riority leaf µ of µ contains at least one rectangle not intersecting Q. Each such rectangle E can be searated from Q by a line containing one of the sides of Q refer to Figure 2. Assume without loss of generality that this is the vertical line x = x min(q) through the left edge of Q, that is, E s right edge lies to the left of Q s left edge, so that x max(e) x min(q). This means that the oint E in fourdimensional sace corresonding to E lies to the left of the axisarallel hyerlane H that intersects the x maxaxis at x min(q). Now recall that T S is basically a fourdimensional kdtree on S (with riority leaves added), and thus that a fourdimensional region Rµ 4 can be associated with µ. Since the query Q visits µ, there must also be at least one rectangle F in the subtree rooted at µ that has x max(f ) > x min(q), so that F lies to the right of H. It follows that Rµ 4 contains oints on both sides of H and therefore H must intersect Rµ. 4 Now observe that the rectangles in the riority leaf µ xmax cannot be searated from Q by the line x = x min(q) through µ y min R 4 µ ν E x = x min (Q) E X H : xmax = xmin(q) F µ xmax G H : y min = y max (Q) F G Q Q y = x max (Q) y = x max (Q) x max Figure 2: The roof of Lemma 2, with µ in the lane (uer figure), and µ in fourdimensional sace (lower figure the x min and y max dimensions are not shown). Note that X = H H is a twodimensional hyerlane in fourdimensional sace. It contains a twodimensional facet of the transformation of the query range into four dimensions. the left edge of Q: Rectangles in µ xmax are extreme in the ositive xdirection, so if one of them lies comletely to the left of Q, then all rectangles in µ s children including ν would lie to the left of Q; in that case ν would not be visited. Since (by definition of ν) not all rectangles in µ xmax intersect Q, there must be a line through one of Q s other sides, say the horizontal line y = y max(q), that searates Q from a rectangle G in µ xmax. Hence, the hyerlane H that cuts the y minaxis at y max(q) also intersects Rµ. 4 y the above arguments, at least two of the threedimensional hyerlanes defined by x min(q), x max(q), y min(q) and y max(q) intersect the region Rµ 4 associated with µ when viewing T S as a fourdimensional kdtree. Hence, the intersection X of these two hyerlanes, which is a twodimensional lane in fourdimensional sace, also intersects Rµ. 4 With the riority leaves removed, T S becomes a fourdimensional kdtree with O(N/) leaves; from a straightforward
5 generalization of the standard analysis of kdtrees we know that any axisarallel twodimensional lane intersects at most O( N/) of the regions associated with the nodes in such a tree [2]. All that remains is to observe that Q defines O(1) such lanes, namely one for each air of sides. Thus O( N/) is a bound on the number of nodes ν that are not riority leaves and are visited by the query rocedure, where not all rectangles in any of the riority leaves below ν s arent are reorted Efficient construction algorithm Note that it is easy to bulkload a seudoprtree T S on a set S of N rectangles in O( N log N) I/Os by simly constructing one node at a time following the definition in Sec tion We will now describe how, under the reasonable assumtion that the amount M of available main memory is Ω( 4/3 ), we can bulkload T S using O( N log M/ N ) I/Os. Our algorithm is a modified version of the kdtree construction algorithm described in [1, 20]; it is easiest described as constructing a fourdimensional kdtree T S on the oints S. In the construction algorithm we first construct, in a rerocessing ste, four sorted lists L xmin, L ymin, L xmax, L ymax containing the oints in S sorted by their x min, y min, x max, and y maxcoordinate, resectively. Then we construct Θ(log M) levels of the tree, and recursively construct the rest of the tree. To construct Θ(log M) levels of T S efficiently we roceed as follows. We first choose a arameter z (which will be exlained below) and use the four sorted lists to find the (kn/z)th coordinate of the oints S in each dimension, for all k {1, 2,..., z 1}. These coordinates define a fourdimensional grid of size z 4 ; we then scan S and count the number of oints in each grid cell. We choose z to be Θ(M 1/4 ), so that we can kee these counts in main memory. Next we build the Θ(log M) levels of T S without worrying about the riority leaves: To construct the root ν of T S, we first find the slice of z 3 grid cells with common x mincoordinate such that there is a hyerlane orthogonal to the x minaxis that asses through these cells and has at most half of the oints in S on one side and at most half of the oints on the other side. y scanning the O(N/(z)) blocks from L xmin that contain the O(N/z) oints in these grid cells, we can determine the exact x minvalue x to use in ν such that the hyerlane H, defined by x min = x, divides the oints in S into two subsets with at most half of the oints each. After constructing ν, we subdivide the z 3 grid cells intersected by H, that is, we divide each of the z 3 cells in two at x and comute their counts by rescanning the O(N/(z)) blocks from L xmin that contain the O(N/z) oints in these grid cells. Then we construct a kdtree on each side of the hyerlane defined by x recursively (cycling through all four ossible cutting directions). Since we create O(z 3 ) new cells every time we create a node, we can ensure that the grid still fits in main memory after constructing z nodes, that is, log z = Θ(log M) levels of T S. After constructing the Θ(log M) kdtree levels, we construct the four riority leaves for each of the z nodes. To do so we reserve main memory sace for the oints in each of the riority leaves; we have enough main memory to hold all riority leaves, since by the assumtion that M is Ω( 4/3 ) we have 4 Θ() Θ(z) = O(M). Then we fill the riority leaves by scanning S and filtering each oint Ri through the kdtree, one by one, as follows: We start at the root of ν of T S, and check its riority leaves ν x min, ν y min, ν xmax, and one by one in that order. If we encounter a nonfull ν ymax leaf we simly lace Ri there; if we encounter a full leaf ν and Ri is more extreme in the relevant direction than the least extreme oint Rj in ν, we relace Rj with Ri and continue the filtering rocess with Rj. After checking ν ymax we continue to check the riority leaves of the child of ν in T S whose region contains the oint we are rocessing; if ν does not have such a child (because we arrived at leaf level in the kdtree) we simly continue with the next oint in S. It is easy to see that the above rocess correctly constructs the to Θ(log M) levels of the seudoprtree T S on S, excet that the kdtree divisions are slightly different than the ones defined in Section 2.1.1, since the oints in the riority leaves are not removed before the divisions are comuted. However, the bound of Lemma 2 still holds: The O(T/) term does not deend on the choice of the divisions, and the kdtree analysis that brought the O( N/) term only deends on the fact that each child gets at most half of the oints of its arent. After constructing the Θ(log M) levels and their riority leaves, we scan through the four sorted lists L xmin, L ymin, L xmax, L ymax and divide them into four sorted lists for each of the Θ(z) leaves of the constructed kdtree, while omitting the oints already stored in riority leaves. These lists contain O(N/z) oints each; after writing the constructed kdtree and riority leaves to disk we use them to construct the rest of T S recursively. Note that once the number of oints in a recursive call gets smaller than M, we can simly construct the rest of the tree in internal memory one node at a time. This way we can make slightly unbalanced divisions, so that we have a multile of oints on one side of each dividing hyerlane. Thus we can guarantee that we get at most one nonfull leaf er subtree of size Θ(M), and obtain almost 100% sace utilization. To avoid having an underfull leaf that may violate assumtions made by udate algorithms, we may make the riority leaves under its arent slightly smaller so that all leaves contain Θ() rectangles. This also imlies that the bound of Lemma 1 still holds. Lemma 3. A seudoprtree can be bulkloaded with N rectangles in the lane in O( N log M/ N ) I/Os. Proof. The initial construction of the sorted lists takes O( N log M/ N ) I/Os. To construct Θ(log M) levels of TS we use O(N/) I/Os to construct the initial grid, as well as O(N/(z)) to construct each of the z nodes for a total of O(N/) I/Os. Constructing the riority leaves by filtering also takes O(N/) I/Os, and so does the distribution of the remaining oints in S to the recursive calls. Thus each recursive ste takes O(N/) I/Os in total. The lemma follows since there are O(log N / log M) = O(log M N ) levels of recursion. 2.2 Twodimensional PRtree In this section we describe how to obtain a PRtree (with degree Θ() and all leaves on the same level) from a seudo PRtree (with degree six and leaves on all levels), while maintaining the O( N/+T/) I/O window query bound. The PRtree is built in stages bottomu: In stage 0 we construct the leaves V 0 of the tree from the set S 0 = S of N inut rectangles; in stage i 1 we construct the nodes V i on level i of the tree from a set S i of O(N/ i ) rectangles,
6 consisting of the minimal bounding boxes of all nodes in V i 1 (on level i 1). Stage i consists of constructing a seudoprtree T Si on S i; V i then simly consists of the (riority as well as normal) leaves of T Si ; the internal nodes are discarded. 3 The bottomu construction ends when the set S i is small enough so that the rectangles in S i and the ointers to the corresonding subtrees fit into one block, which is then the root of the PRtree. Theorem 1. A PRtree on a set S of N rectangles in the lane can be bulkloaded in O( N log M/ N ) I/Os, such that a window query can be answered in O( N/ + T/) I/Os. Proof. y Lemma 3, stage i of the PRtree bulkloading algorithm uses O(( S i /) log M/ ( S i /)) I/Os, which is O((N/ i+1 N ) log M/ ). Thus the comlete PRtree is constructed in O(log O N) N = log N M/ I/Os. i=0 O N log N i+1 M/ To analyze the number of I/Os used to answer a window query Q, we will analyze the number of nodes visited on each level of the tree. Let T i (i 0) be the number of nodes visited on level i. Since the nodes on level 0 (the leaves) corresond to the leaves of a seudoprtree on the N inut rectangles S, it follows from Lemma 2 that T 0 = O( N/+ T/); in articular, for big enough N and, there exists a constant c such that T 0 c N/ + c(t/). There must be T i 1 rectangles in nodes of level i 1 of the PRtree that intersect Q, since these nodes contain the bounding boxes of nodes on level i 1. Since nodes on level i corresond to the leaves of a seudoprtree on the N/ i rectangles in S i, it follows from Lemma 2 that for big enough N and, we have T i (c/ i ) N/ + c(t i 1/). Summing over all O(log N) levels and solving the recurrence reveals that O( N/ + T/) nodes are visited in total. 2.3 Multidimensional PRtree In this section we briefly sketch how our PRtree generalizes to dimensions greater than two. We focus on how to generalize seudoprtrees, since a ddimensional PRtree can be obtained using ddimensional seudoprtrees in exactly the same way as in the twodimensional case; that the ddimensional PRtree has the same asymtotic erformance as the ddimensional seudoprtree is also roved exactly as in the twodimensional case. Recall that a twodimensional seudoprtree is basically a fourdimensional kdtree, where four riority leaves containing extreme rectangles in each of the four directions have been added below each internal node. Similarly, a ddimensional seudoprtree is basically a 2ddimensional kdtree, where each node has 2d riority leaves with extreme rectangles in each of the 2d standard directions. For constant 3 There is a subtle difference between the seudoprtree algorithm used in stage 0 and the algorithm used in stages i > 0. In stage 0, we construct leaves with inut rectangles. In stages i > 0, we construct nodes with ointers to children and bounding boxes of their subtrees. The number of children that fits in a node might differ by a constant factor from the number of rectangles that fits in a leaf, so the number of children might be Θ() rather than. For our analysis the difference does not matter and is therefore ignored for simlicity. d, the structure can be constructed in O( N log M/ N ) I/Os using the same grid method as in the twodimensional case (Section 2.1.3); the only difference is that in order to fit the 2ddimensional grid in main memory we have to decrease z (the number of nodes roduced in one recursive stage) to Θ(M 1/2d ). To analyze the number of I/Os used to answer a window query on a ddimensional seudoprtree, we analyze the number of visited internal nodes as in the twodimensional case (Section 2.1.2); the total number of visited nodes is at most a factor 2d higher, since at most 2d riority leaves can be visited er internal node visited. As in the twodimensional case, O(T/) is a bound on the number of nodes ν visited where all rectangles in at least one of the riority leaves below ν s arent are reorted. The number of nodes ν visited such that each riority leaf of ν s arent contains at least one rectangle not intersecting the query can then be bounded using an argument similar to the one used in two dimensions; it is equal to the number of regions associated with the nodes in a 2ddimensional kdtree with O(N/) leaves that intersect the (2d 2)dimensional intersection of two orthogonal hyerlanes. It follows from a straightforward generalization of the standard kdtree analysis that this is O((N/) 1 1/d ) [2]. Theorem 2. A PRtree on a set of N hyerrectangles in d dimensions can be bulkloaded in O( N log M/ N ) I/Os, such that a window query can be answered in O((N/) 1 1/d + T/) I/Os. 2.4 Lower bound for heuristic Rtrees The PRtree is the first Rtree variant that always answers a window query worstcase otimally. In fact, most other R tree variants can be forced to visit Θ(N/) nodes to answer a query even when no rectangles are reorted (T = 0). In this section we show how this is the case for the acked Hilbert Rtree, the fourdimensional Hilbert Rtree, and the TGS Rtree. Theorem 3. There exist a set of rectangles S and a window query Q that does not intersect any rectangles in S, such that all Θ(N/) nodes are visited when Q is answered using a acked Hilbert Rtree, a fourdimensional Hilbert Rtree, or a TGS Rtree on S. Proof. We will construct a set of oints S such that all leaves in a acked Hilbert Rtree, a fourdimensional Hilbert Rtree, and a TGS Rtree on S are visited when answering a line query that does not touch any oint. The theorem follows since oints and lines are all secial rectangles. For convenience we assume that 4, N = 2 k and N/ = m, for some ositive integers k and m, so that each leaf of the Rtree contains rectangles, and each internal node has fanout. We construct S as a grid of N/ columns and rows, where each column is shifted u a little, deending on its horizontal osition (each row is in fact a HaltonHammersley oint set; see e.g. [8]). More recisely, S has a oint ij = (x ij, y ij), for all i {0,..., N/ 1} and j {0,..., 1}, such that x ij = i + 1/2, and y ij = j/ + h(i)/n. Here h(i) is the number obtained by reversing, i.e. reading backwards, the kbit binary reresentation of i. An examle with N = 64, = 4 is shown in Figure 3. Now, let us examine the structure of each of the three Rtree variants on this dataset.
7 < 1 1 σ Figure 3: Worstcase examle i 1 = k 2 t i 2 = (k + 1)2 t 1 column i 1 i 2 1 σ = 1 2 t Figure 4: TGS artitioning the worstcase examle. A vertical division creates two bounding boxes with a total area of less than i 2 i 1 1. A horizontal division creates two bounding boxes with a total area of more than (i 2 i 1)(1 2σ) > i 2 i 1 1. Two and fourdimensional acked Hilbert Rtree: the Hilbert curve visits the columns in our grid of oints one by one; when it visits a column, it visits all oints in that column before roceeding to another column (we omit the details of the roof from this abstract). Therefore, the acked Hilbert Rtree makes a leaf for every column, and a horizontal line can be chosen to intersect all these columns while not touching any oint. TGS Rtree: The TGS algorithm will artition S into subsets of equal size and artition each subset recursively. The artitioning is imlemented by choosing a artitioning line that searates the set into two subsets (whose sizes are multiles of N/), and then alying binary artitions to the subsets recursively until we have artitioned the set into N subsets of size N/. Observe that on all levels in this recursion, the artitioning line will leave at least a fraction 1/ of the inut on each side of the line. elow we rove that TGS will always artition by vertical lines; it follows that TGS will eventually ut each column in a leaf. Then a line query can intersect all leaves but reort nothing. Suose TGS is about to artition the subset S(i 1, i 2) of S that consists of columns i 1 to i 2 inclusive, with i 2 > i 1, i.e. S(i 1, i 2) = { ij i {i 1,..., i 2}, j {0,..., 1}}. When the greedy slit algorithm gets to divide such a set into two, it can look for a vertical artitioning line or for a horizontal artitioning line. Intuitively, TGS favors artitioning lines that create a big ga between the bounding boxes of the oints on each side of the line. As we will show below, we have constructed S such that the area of the ga created by a horizontal artitioning line is always roughly the same, as is the area of the ga created by a vertical line, with the latter always being bigger. Partitioning with a vertical line would always leave a ga of roughly a square that fits between two columns see Figure 4. More recisely, it would artition the set S(i 1, i 2) into two sets S(i 1, c 1) and S(c, i 2), for some c {i 1 +1,..., i 2}. The bounding boxes of these two sets would each have height less than 1, and their total width would be (c 1 i 1) + (i 2 c), so their total area A v would be less than i 2 i 1 1. The width of a ga around a horizontal artitioning line deends on the number of columns in S(i 1, i 2). However, the more columns are involved the bigger the density of the oints in those columns when rojected on the yaxis, and the lower the ga that can be created see Figure 4 for an illustration. As a result, artitioning with a horizontal line can lead to gas that are wide and low, or relatively high but not so wide; in any case, the area of the ga will be roughly the same. More recisely, when we artition this set by a horizontal line, the total area A h of the resulting bounding boxes must be at least i 2 i 1 4/ (we omit the details from this abstract). Recall that A v is less than i 2 i 1 1. Since 4, we can conclude that A h > A v, and that artitioning with a vertical line will always result in a smaller total area of bounding boxes than with a horizontal line. As a result, TGS will always cut vertically between the columns. 3. EXPERIMENTS In this section we describe the results of our exerimental study of the erformance of the PRtree. We comared the PRtree to several other bulkloading methods known to generate queryefficient Rtrees: The acked Hilbert Rtree (denoted H in the rest of this section), the fourdimensional Hilbert Rtree (denoted H4), and the TGS Rtree (denoted TGS). Among these, TGS has been reorted to have the best query erformance, but it also takes many I/Os to bulkload. In contrast, H is simle to bulkload, but it has worse query erformance because it does not take the extent of the inut rectangles into account. H4 has been reorted to be inferior to H [15], but since it takes the extent into account (like TGS) it should intuitively be less vulnerable to extreme datasets. 3.1 Exerimental setu We imlemented the four bulkloading algorithms in C++ using TPIE [3]. TPIE is a library that rovides suort for imlementing I/Oefficient algorithms and data structures. In our imlementation we used 36 bytes to reresent each inut rectangle; 8 bytes for each coordinate and 4 bytes to be able to hold a ointer to the original object. Each bounding box in the internal nodes also used 36 bytes; 8 bytes for each coordinate and 4 bytes for a ointer to the disk block storing the root of the corresonding subtree. The disk block size was chosen to be 4K, resulting in a maximum fanout of 113. This is similar to earlier exerimental studies, which tyically use block sizes ranging from 1K to 4K or fix the fanout to a number close to 100. As exerimental latform we used a dedicated Dell PowerEdge 2400 workstation with one Pentium III/500MHz rocessor running FreeSD 4.3. A local 36G SCSI disk (IM Ultrastar 36LZX) was used to store all necessary files: the inut data, the Rtrees, as well as temorary files. We restricted the main memory to 128M and further restricted the amount of memory available to TPIE to 64M; the rest was reserved to oerating system daemons. 3.2 Datasets We used both reallife and synthetic data in our exeriments Reallife data As the reallife data we used the tiger/line data [24] of geograhical features in the United States. This data is
8 the standard benchmark data used in satial databases. It is distributed on six CDROMs and we chose to exeriment with the road line segments from two of the CDROMs: Disk one containing data for sixteen eastern US states and disk six containing data from five western US states; we use Eastern and Western to refer to these two datasets, resectively. To obtain datasets of varying sizes we divided the Eastern dataset into five regions of roughly equal size, and then ut an increasing number of regions together to obtain datasets of increasing sizes. The largest set is just the whole Eastern dataset. For each dataset we used the bounding boxes of the line segments as our inut rectangles. As a result, the Eastern dataset had 16.7 million rectangles, for a total size of 574M, and the Western data set had 12 million rectangles, for a total size of 411M. Note that the biggest dataset is much larger than those used in revious works (which only used u to 100,000 rectangles) [15, 12]. Note also that our tiger data is relatively nicely distributed; it consist of relatively small rectangles (long roads are divided into short segments) that are somewhat (but not too badly) clustered around urban areas Synthetic data To investigate how the different Rtrees erform on more extreme datasets than the tiger data, we generated a number of synthetic datasets. Each of these synthetic datasets consisted of 10 million rectangles (or 360M) in the unit square. size(max side): We designed the first class of synthetic datasets to investigate how well the Rtrees handle rectangles of different sizes. In the size(max side) dataset the rectangle centers were uniformly distributed and the lengths of their sides uniformly and indeendently distributed between 0 and max side. When generating the datasets, we discarded rectangles that were not comletely inside the unit square (but made sure each dataset had 10 million rectangles). A ortion of the dataset size(0.001) is shown in Figure 5. asect ratios were fixed to a and the longest sides chosen to be vertical or horizontal with equal robability. We also made sure that all rectangles fell comletely inside the unit square. A ortion of the dataset asect(10) is shown in Figure 6. Note that if the inut rectangles are bounding boxes of line segments that are almost horizontal or vertical, one will indeed get rectangles with very high asect ratio even infinite in the case of horizontal or vertical segments. Figure 6: Synthetic dataset ASPECT(10) skewed(c): In many reallife multidimensional datasets different dimensions often have different distributions, some of which may be highly skewed comared to the others. We designed the third class of datasets to investigate how this affects Rtree erformance. skewed(c) consists of uniformly distributed oints that have been squeezed in the ydimension, that is, each oint (x, y) is relaced with (x, y c ). An examle of skewed(5) is shown in Figure 7. Figure 7: Synthetic dataset SKEWED(5) Figure 5: Synthetic dataset SIZE(0.001) asect(a): The second class of synthetic datasets was designed to investigate how the Rtrees handle rectangles with different asect ratios. The areas of the rectangles in all the datasets were fixed to 10 6, a reasonably small size. In the asect(a) dataset the rectangle centers were uniformly distributed but their cluster: Our final dataset was designed to illustrate the worstcase behavior of the H, H4 and TGS R trees. It is similar to the worstcase examle discussed in Section 2. It consists of clusters with centers equally saced on a horizontal line. Each cluster consists of 1000 oints uniformly distributed in a square surrounding its center. Figure 8 shows a art of the cluster dataset.
9 21.1 Greedy (TGS) Hilbert (H/H4) PRtree (PR) Greedy (TGS) Figure 8: Synthetic dataset CLUSTER Western data 1.2 mln 3.1 mln 14.7 mln Eastern data 1.7 mln 4.4 mln 21.1 mln million blocks read or written Hilbert (H/H4) PRtree (PR) Greedy (TGS) 451 s s s 583 s s s Figure 9: ulkloading erformance on TIGER data: I/O (uer figure) and time (lower figure). 3.3 Exerimental results elow we discuss the results of our bulkloading and query exeriments with the four Rtree variants ulkloading erformance We bulkloaded each of the Rtrees with each of the reallife tiger datasets, as well as with the synthetic datasets for various arameter values. In all exeriments and for all Rtrees we achieved a sace utilization above 99%. 4 We measured the time sent and counted the number of 4K blocks read or written when bulkloading the trees. Note that all algorithms we tested read and write blocks almost exclusively by sequential I/O of large arts of the data; as a result, I/O is much faster than if blocks were read and written in random order. Figure 9 shows the results of our exeriments using the Eastern and Western datasets. oth exeriments yield the same result: The H and H4 algorithms use the same number of I/Os, and roughly 2.5 times fewer I/Os than PR. This is not surrising since even though the three algorithms have the same O( N log M/ N ) I/O bounds, the PR algorithm is much more comlicated than the H and H4 algorithms. The TGS algorithm uses roughly 4.5 times more I/Os than PR, which is also not surrising given that the algorithm makes binary artitions so that the number of levels of recursion is effectively O(log 2 N). In terms of time, the H and H4 algorithms are still more than 3 times faster than the PR algorithm, but the TGS algorithm is only roughly 3 times slower than PR. This shows that H, H4 and PR are all more CPUintensive than TGS. Figure 10 shows the results of our exeriments with the five Eastern datasets. These exeriments show that the H, H4 and PR algorithms scale relatively linearly with dataset N size; this is a result of the log M/ factor in the bulkloading bound being the same for all datasets. The cost of the TGS algorithm seems to grow in an only slightly suerlinear way with the size of the data set. This is a result of the log 2 N factor in the bulkloading bound being almost 4 When Rtrees are bulkloaded to subsequently be udated dynamically, near 100% sace utilization is often not desirable [10]. However, since we are mainly interested in the query erformance of the Rtree constructed with the different bulkloading methods, and since the methods could be modified in the same way to roduce nonfull leaves, we only considered the near 100% utilization case PRtree (PR) Hilbert (H/H4) mln 5.7 mln 9.2 mln 12.7 mln 16.7 mln rectangles Figure 10: ulkloading erformances on Eastern datasets (I/Os) the same for all data sets. In our exeriments with the synthetic data we found that the erformance of the H, H4 and PR bulkloading algorithms was ractically the same for all the datasets, that is, unaffected by the data distribution. This is not surrising, since the erformance should only deend on the dataset size (and all the synthetic datasets have the same size). The PR algorithm erformance varied slightly, which can be exlained by the small effect the data distribution can have on the grid method used in the bulkloading algorithm (subtrees may have slightly different sizes due to the removal of riority boxes). On average, the H and H4 algorithms sent 381 seconds and 1.0 million I/Os on each of the synthetic datasets, while the PR algorithm sent 1289 seconds and 2.6 million I/Os. On the other hand, as exected, the erformance of the TGS algorithm varied significantly over the synthetic datasets we tried; the binary artitions made by the algorithm deend heavily on the inut data distribution. The TGS algorithm was between 4.6 and 16.4 times slower than the PR algorithm in terms of I/O, and between 2.8 and 10.9 times slower in terms of time. Due to lack of sace, we only show the erformance of the TGS algorithm on the size(max side) and asect(a) datasets in Figure 11. The oint datasets, skewed(c) and cluster, were all built in between and seconds Query erformance After bulkloading the four Rtree variants we exerimented with their query erformance; in each of our exeriments we erformed 100 randomly generated queries and comuting their average erformance (a more exact descrition of the queries is given below). Following revious exerimental studies, we utilized a cache (or buffer ) to store internal Rtree nodes during queries. In fact, in all our exeriments we cached all internal nodes since they never occuied more than 6M. This means that when reorting the number of I/Os needed to answer a query, we are in effect reorting the number of leaves visited in order to answer the query. 5 For several reasons, and following revious 5 Exeriments with the cache disabled showed that in our exeriments the cache actually had relatively little effect on
10 % % Hilbert 4D (H4) Hilbert 2D (H) PRtree (PR) Greedy (TGS) 100% % of total area rectangles outut % SIZE(max side) ASPECT(a) Figure 11: ulkloading time in seconds of Todown Greedy Slit on synthetic data sets of 10 million rectangles each. 120% Figure 13: Query erformance for queries with squares of varying size on the Eastern TIGER data. The erformance is given as the number of blocks read divided by the outut size T/. 120% H4 PR H 110% Hilbert 4D (H4) Hilbert 2D (H) 110% Hilbert 4D (H4) Hilbert 2D (H) TGS PRtree (PR) Greedy (TGS) 100% PRtree (PR) Greedy (TGS) % of total area rectangles outut Figure 12: Query erformance for queries with squares of varying size on the Western TIGER data. The erformance is given as the number of blocks read divided by the outut size T/. 100% mln rectangles inut rectangles outut Figure 14: Query erformance for queries with squares of area 0.01 on Eastern TIGER data sets of varying size. The erformance is given as the number of blocks read divided by the outut size T/. exerimental studies [6, 12, 15, 16], we did not collect timing data. Two main reasons for this are (1) that I/O is a much more robust measure of erformance, since the query time is easily affected by oerating system caching and by disk block layout; and (2) that we are interested in heavy load scenarios where not much cache memory is available or where caches are ineffective, that is, where I/O dominates the query time. TIGER data: We first erformed query exeriments using the Eastern and Western datasets. The results are summarized in Figure 12, 13, and 14. In Figure 12 and 13 we show the results of exeriments with square window queries with areas that range from 0.25% to 2% of the area of the bounding box of all inut rectangles. We used smaller queries than revious exerimental studies (for examle, the maximum query in [15] occuies 25% of the area) because our datasets are much larger than the datasets used in revious exeriments without reducing the query size the outthe window query erformance. ut would be unrealistically large and the reorting cost would thus dominate the overall query erformance. In Figure 14 we show the results of exeriments on the five Eastern datasets of various sizes with a fixed query size of 1%. The results show that all four Rtree variants erform remarkably well on the tiger data; their erformance is within 10% of each other and they all answer queries in close to T/, the minimum number of necessary I/Os. Their relative erformance generally agrees with earlier results [15, 12], that is, TGS erforms better than H, which in turn is better than H4. PR consistently erforms slightly better than both H and H4 but slightly worse than TGS. Synthetic data. Next we erformed exeriments with our synthetic datasets, designed to investigate how the different Rtrees erform on more extreme datasets than the tiger data. For each of the datasets size, asect and skewed we erformed exeriments where we varied the arameter to obtain data ranging from fairly normal to rather extreme. elow we summarize our results. The left side of Figure 15 shows the results of our ex
11 340% H H H4 200% TGS H TGS TGS TGS 150% 100% PR H4 H H H4 PR outut rectangles % SIZE(max side) ASPECT(a) SKEWED(c) PR Figure 15: Query erformance for queries with squares of area 0.01 on synthetic data sets. The erformance is given as the number of blocks read divided by the outut size T/. eriments with the dataset size(max side) when varying max side from to 0.2, that is, from relatively small to relatively large rectangles. As queries we used squares with area Our results show that for relatively small inut rectangles, like the tiger data, all the Rtree variants erform very close to the minimum number of necessary I/Os. However, as the inut rectangles get larger, PR and H4 clearly outerform H and TGS. H erforms the worst, which is not surrising since it does not take the extent of the inut rectangles into account. TGS erforms significantly better than H but still worse than PR and H4. Intuitively, PR and H4 can handle large rectangles better, because they rigorously divide rectangles into grous of rectangles that are similar in all four coordinates. This may enable these algorithms to grou likely answers, namely large rectangles, together so that they can be retrieved with few I/Os. It also enables these algorithms to grou small rectangles nicely, while TGS, which strives to minimize the total area of bounding boxes, may be indifferent to the distribution of the small rectangles in the resence of large rectangles. The middle of Figure 15 shows the results of our exeriments with the dataset asect(a), when we vary a from 10 to 10 5, that is, when we go from rectangles (of constant area) with small to large asect ratio. As query we again used squares with area The results are very similar to the results of the size dataset exeriments, excet that as the asect ratio increases, PR and H4 become significantly better than TGS and esecially H. Unlike with the size dataset, PR erforms as well as H4 and they both erform close to the minimum number of necessary I/Os to answer a query. Thus this set of exeriments reemhasizes that both the PRtree and H4tree are able to adot to varying extent very well. The right side of Figure 15 shows the result of our exeriments with the dataset skewed(c), when we vary c from 1 to 9, that is, when we go from a uniformly distributed oint set to a very skewed oint set. As query we used squares with area 0.01 that are skewed in the same way as the dataset (that is, where the corner (x, y) is transformed to (x, y c )) so that the outut size remains roughly the same. As exected, the PR erformance is unaffected by the transformations, since our bulkloading algorithm is based only on the relative order of coordinates: xcoordinates are only comared to xcoordinates, and ycoordinates are only comared to ycoordinates; there is no interaction between them. On the other hand, the query erformance of the three other Rtrees degenerates quickly as the oint set gets more skewed. As a final exeriment, we queried the cluster dataset with long skinny horizontal queries (of area ) through the clusters; the ycoordinate of the leftmost bottom corner was chosen randomly such that the query assed through all clusters. The results are shown in Table 1. As anticiated, the query erformance of H, H4 and TGS is very bad; the cluster dataset was constructed to illustrate the worstcase behavior of the structures. Even though a query only returns around 0.3% of the inut oints on average, the tree: H H4 PR TGS # I/Os: % of the Rtree visited: 37% 94% 1.2% 25% Table 1: Query erformances on synthetic dataset CLUSTER.
12 query algorithm visits 37%, 94% and 25% of the leaves in H, H4 and TGS, resectively. In comarison, only 1.2% of the leaves are visited in PR. Thus the PRtree outerforms the other indexes by well over an order of magnitude. 3.4 Conclusions of the exeriments The main conclusion of our exerimental study is that the PRtree is not only theoretically efficient but also ractically efficient. Our bulkloading algorithm is slower than the acked Hilbert and fourdimensional Hilbert bulkloading algorithms but much faster than the TGS Rtree bulkloading algorithm. Furthermore, unlike for the TGS Rtree, the erformance of our bulkloading algorithm does not deend on the data distribution. The query erformance of all four Rtrees is excellent on nicely distributed data, including the reallife tiger data. On extreme data however, the PRtree is much more robust than the other Rtrees (even though the fourdimensional Hilbert Rtree is also relatively robust). 4. CONCLUDING REMARKS In this aer we resented the PRtree, which is the first Rtree variant that can answer any window query in the otimal O( N/ +T/) I/Os. We also erformed an extensive exerimental study, which showed that the PRtree is not only otimal in theory, but that it also erforms excellent in ractice: for normal data, it is quite cometitive to the best known heuristics for bulkloading Rtrees, namely the acked HilbertRtree [15] and the TGS Rtree [12], while for data with extreme shaes or distributions, it outerforms them significantly. The PRtree can be udated using any known udate heuristic for Rtrees, but then its erformance cannot be guaranteed theoretically anymore and its ractical erformance might suffer as well. Alternatively, we can use the dynamic version of the PRtree using the logarithmic method, which has the same theoretical worstcase query erformance and can be udated efficiently. In the future we wish to exeriment to see what haens to the erformance when we aly heuristic udate algorithms and when we use the theoretically suerior logarithmic method. 5. REFERENCES [1] P. K. Agarwal, L. Arge, O. Procoiuc, and J. S. Vitter. A framework for index bulk loading and dynamization. In Proc. International Colloquium on Automata, Languages, and Programming, ages , [2] P. K. Agarwal, M. de erg, J. Gudmundsson, M. Hammar, and H. J. Haverkort. oxtrees and Rtrees with nearotimal query time. Discrete and Comutational Geometry, 28(3): , [3] L. Arge, O. Procoiuc, and J. S. Vitter. Imlementing I/Oefficient data structures using TPIE. In Proc. Euroean Symosium on Algorithms, ages , [4] L. Arge and J. Vahrenhold. I/Oefficient dynamic lanar oint location. International Journal of Comutational Geometry & Alications, To aear. [5] R. ayer and E. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, 1: , [6] N. eckmann, H.P. Kriegel, R. Schneider, and. Seeger. The R*tree: An efficient and robust access method for oints and rectangles. In Proc. SIGMOD International Conference on Management of Data, ages , [7] S. erchtold, C. öhm, and H.P. Kriegel. Imroving the query erformance of highdimensional index structures by bulk load oerations. In Proc. Conference on Extending Database Technology, LNCS 1377, ages , [8]. Chazelle. The Discreancy Method: Randomness and Comlexity. Cambridge University Press, New York, [9] D. Comer. The ubiquitous tree. ACM Comuting Surveys, 11(2): , [10] D. J. DeWitt, N. Kabra, J. Luo, J. M. Patel, and J.. Yu. Clientserver aradise. In Proc. International Conference on Very Large Databases, ages , [11] V. Gaede and O. Günther. Multidimensional access methods. ACM Comuting Surveys, 30(2): , [12] Y. J. García, M. A. Lóez, and S. T. Leutenegger. A greedy algorithm for bulk loading Rtrees. In Proc. 6th ACM Symosium on Advances in GIS, ages , [13] A. Guttman. Rtrees: A dynamic index structure for satial searching. In Proc. SIGMOD International Conference on Management of Data, ages 47 57, [14] H. J. Haverkort, M. de erg, and J. Gudmundsson. oxtrees for collision checking in industrial installations. In Proc. ACM Symosium on Comutational Geometry, ages 53 62, [15] I. Kamel and C. Faloutsos. On acking Rtrees. In Proc. International Conference on Information and Knowledge Management, ages , [16] I. Kamel and C. Faloutsos. Hilbert Rtree: An imroved Rtree using fractals. In Proc. International Conference on Very Large Databases, ages , [17] K. V. R. Kanth and A. K. Singh. Otimal dynamic range searching in nonrelicating index structures. In Proc. International Conference on Database Theory, LNCS 1540, ages , [18] S. T. Leutenegger, M. A. Lóez, and J. Edgington. STR: A simle and efficient algorithm for Rtree acking. In Proc. IEEE International Conference on Data Engineering, ages , [19] Y. Manolooulos, A. Nanooulos, A. N. Paadooulos, and Y. Theodoridis. Rtrees have grown everywhere. Technical Reort available at htt:// [20] O. Procoiuc, P. K. Agarwal, L. Arge, and J. S. Vitter. kdtree: A dynamic scalable kdtree. In Proc. International Symosium on Satial and Temoral Databases, [21] J. Robinson. The KD tree: A search structure for large multidimensional dynamic indexes. In Proc. SIGMOD International Conference on Management of Data, ages 10 18, [22] N. Roussooulos and D. Leifker. Direct satial search on ictorial databases using acked Rtrees. In Proc. SIGMOD International Conference on Management of Data, ages 17 31, [23] T. Sellis, N. Roussooulos, and C. Faloutsos. The R + tree: A dynamic index for multidimensional objects. In Proc. International Conference on Very Large Databases, ages , [24] TIGER/Line TM Files, 1997 Technical Documentation. Washington, DC, Setember htt://
Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)
Point Location Prerocess a lanar, olygonal subdivision for oint location ueries. = (18, 11) Inut is a subdivision S of comlexity n, say, number of edges. uild a data structure on S so that for a uery oint
More informationThe Online Freezetag Problem
The Online Freezetag Problem Mikael Hammar, Bengt J. Nilsson, and Mia Persson Atus Technologies AB, IDEON, SE3 70 Lund, Sweden mikael.hammar@atus.com School of Technology and Society, Malmö University,
More informationI/OEfficient Spatial Data Structures for Range Queries
I/OEfficient Spatial Data Structures for Range Queries Lars Arge Kasper Green Larsen MADALGO, Department of Computer Science, Aarhus University, Denmark Email: large@madalgo.au.dk,larsen@madalgo.au.dk
More informationLocal Connectivity Tests to Identify Wormholes in Wireless Networks
Local Connectivity Tests to Identify Wormholes in Wireless Networks Xiaomeng Ban Comuter Science Stony Brook University xban@cs.sunysb.edu Rik Sarkar Comuter Science Freie Universität Berlin sarkar@inf.fuberlin.de
More informationIntroduction to NPCompleteness Written and copyright c by Jie Wang 1
91.502 Foundations of Comuter Science 1 Introduction to Written and coyright c by Jie Wang 1 We use timebounded (deterministic and nondeterministic) Turing machines to study comutational comlexity of
More informationSynopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE
RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Develoment FRANCE Synosys There is no doubt left about the benefit of electrication and subsequently
More informationA Simple Model of Pricing, Markups and Market. Power Under Demand Fluctuations
A Simle Model of Pricing, Markus and Market Power Under Demand Fluctuations Stanley S. Reynolds Deartment of Economics; University of Arizona; Tucson, AZ 85721 Bart J. Wilson Economic Science Laboratory;
More informationSQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWODIMENSIONAL GRID
International Journal of Comuter Science & Information Technology (IJCSIT) Vol 6, No 4, August 014 SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWODIMENSIONAL GRID
More informationMemory management. Chapter 4: Memory Management. Memory hierarchy. In an ideal world. Basic memory management. Fixed partitions: multiple programs
Memory management Chater : Memory Management Part : Mechanisms for Managing Memory asic management Swaing Virtual Page relacement algorithms Modeling age relacement algorithms Design issues for aging systems
More informationOn Multicast Capacity and Delay in Cognitive Radio Mobile Adhoc Networks
On Multicast Caacity and Delay in Cognitive Radio Mobile Adhoc Networks Jinbei Zhang, Yixuan Li, Zhuotao Liu, Fan Wu, Feng Yang, Xinbing Wang Det of Electronic Engineering Det of Comuter Science and Engineering
More informationData Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing GridFiles Kdtrees Ulf Leser: Data
More informationA Modified Measure of Covert Network Performance
A Modified Measure of Covert Network Performance LYNNE L DOTY Marist College Deartment of Mathematics Poughkeesie, NY UNITED STATES lynnedoty@maristedu Abstract: In a covert network the need for secrecy
More informationLargeScale IP Traceback in HighSpeed Internet: Practical Techniques and Theoretical Foundation
LargeScale IP Traceback in HighSeed Internet: Practical Techniques and Theoretical Foundation Jun Li Minho Sung Jun (Jim) Xu College of Comuting Georgia Institute of Technology {junli,mhsung,jx}@cc.gatech.edu
More informationConfidence Intervals for CaptureRecapture Data With Matching
Confidence Intervals for CatureRecature Data With Matching Executive summary Caturerecature data is often used to estimate oulations The classical alication for animal oulations is to take two samles
More informationThe Optimal Sequenced Route Query
The Otimal Sequenced Route Query Mehdi Sharifzadeh, Mohammad Kolahdouzan, Cyrus Shahabi Comuter Science Deartment University of Southern California Los Angeles, CA 900890781 [sharifza, kolahdoz, shahabi]@usc.edu
More informationEvaluating a WebBased Information System for Managing Master of Science Summer Projects
Evaluating a WebBased Information System for Managing Master of Science Summer Projects Till Rebenich University of Southamton tr08r@ecs.soton.ac.uk Andrew M. Gravell University of Southamton amg@ecs.soton.ac.uk
More information6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks
6.042/8.062J Mathematics for Comuter Science December 2, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Walks Gambler s Ruin Today we re going to talk about onedimensional random walks. In
More informationApproximate Guarding of Monotone and Rectilinear Polygons
Aroximate Guarding of Monotone and Rectilinear Polygons Erik Krohn Bengt J. Nilsson Abstract We show that vertex guarding a monotone olygon is NPhard and construct a constant factor aroximation algorithm
More informationEffect Sizes Based on Means
CHAPTER 4 Effect Sizes Based on Means Introduction Raw (unstardized) mean difference D Stardized mean difference, d g Resonse ratios INTRODUCTION When the studies reort means stard deviations, the referred
More informationAutomatic Search for Correlated Alarms
Automatic Search for Correlated Alarms KlausDieter Tuchs, Peter Tondl, Markus Radimirsch, Klaus Jobmann Institut für Allgemeine Nachrichtentechnik, Universität Hannover Aelstraße 9a, 0167 Hanover, Germany
More informationMinimizing the Communication Cost for Continuous Skyline Maintenance
Minimizing the Communication Cost for Continuous Skyline Maintenance Zhenjie Zhang, Reynold Cheng, Dimitris Paadias, Anthony K.H. Tung School of Comuting National University of Singaore {zhenjie,atung}@com.nus.edu.sg
More informationWeb Application Scalability: A ModelBased Approach
Coyright 24, Software Engineering Research and Performance Engineering Services. All rights reserved. Web Alication Scalability: A ModelBased Aroach Lloyd G. Williams, Ph.D. Software Engineering Research
More informationX How to Schedule a Cascade in an Arbitrary Graph
X How to Schedule a Cascade in an Arbitrary Grah Flavio Chierichetti, Cornell University Jon Kleinberg, Cornell University Alessandro Panconesi, Saienza University When individuals in a social network
More informationExternal Memory Geometric Data Structures
External Memory Geometric Data Structures Lars Arge Department of Computer Science University of Aarhus and Duke University Augues 24, 2005 1 Introduction Many modern applications store and process datasets
More informationPOISSON PROCESSES. Chapter 2. 2.1 Introduction. 2.1.1 Arrival processes
Chater 2 POISSON PROCESSES 2.1 Introduction A Poisson rocess is a simle and widely used stochastic rocess for modeling the times at which arrivals enter a system. It is in many ways the continuoustime
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. SrinkhuizenKuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICCIKAT, Universiteit
More informationwhere a, b, c, and d are constants with a 0, and x is measured in radians. (π radians =
Introduction to Modeling 3.61 3.6 Sine and Cosine Functions The general form of a sine or cosine function is given by: f (x) = asin (bx + c) + d and f(x) = acos(bx + c) + d where a, b, c, and d are constants
More informationButterfly: Privacy Preserving Publishing on Multiple QuasiIdentifiers
Butterfly: Privacy Preserving Publishing on Multile QuasiIdentifiers Technical Reort TR 200818 School of Comuting Science, Simon Fraser University Jian Pei Simon Fraser University jei@cs.sfu.ca Yufei
More informationD.Sailaja, K.Nasaramma, M.Sumender Roy, Venkateswarlu Bondu
Predictive Modeling of Customers in Personalization Alications with Context D.Sailaja, K.Nasaramma, M.Sumender Roy, Venkateswarlu Bondu Nasaramma.K is currently ursuing her M.Tech in Godavari Institute
More informationStatic and Dynamic Properties of Smallworld Connection Topologies Based on Transitstub Networks
Static and Dynamic Proerties of Smallworld Connection Toologies Based on Transitstub Networks Carlos Aguirre Fernando Corbacho Ramón Huerta Comuter Engineering Deartment, Universidad Autónoma de Madrid,
More informationAn important observation in supply chain management, known as the bullwhip effect,
Quantifying the Bullwhi Effect in a Simle Suly Chain: The Imact of Forecasting, Lead Times, and Information Frank Chen Zvi Drezner Jennifer K. Ryan David SimchiLevi Decision Sciences Deartment, National
More informationlecture 25: Gaussian quadrature: nodes, weights; examples; extensions
38 lecture 25: Gaussian quadrature: nodes, weights; examles; extensions 3.5 Comuting Gaussian quadrature nodes and weights When first aroaching Gaussian quadrature, the comlicated characterization of the
More informationRtrees. RTrees: A Dynamic Index Structure For Spatial Searching. RTree. Invariants
RTrees: A Dynamic Index Structure For Spatial Searching A. Guttman Rtrees Generalization of B+trees to higher dimensions Diskbased index structure Occupancy guarantee Multiple search paths Insertions
More informationUniversiteitUtrecht. Department. of Mathematics. Optimal a priori error bounds for the. RayleighRitz method
UniversiteitUtrecht * Deartment of Mathematics Otimal a riori error bounds for the RayleighRitz method by Gerard L.G. Sleijen, Jaser van den Eshof, and Paul Smit Prerint nr. 1160 Setember, 2000 OPTIMAL
More informationA MOST PROBABLE POINTBASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION
9 th ASCE Secialty Conference on Probabilistic Mechanics and Structural Reliability PMC2004 Abstract A MOST PROBABLE POINTBASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION
More informationA Brief Introduction to Design of Experiments
J. K. TELFORD D A Brief Introduction to Design of Exeriments Jacqueline K. Telford esign of exeriments is a series of tests in which uroseful changes are made to the inut variables of a system or rocess
More informationThis document is downloaded from DRNTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DRNTU, Nanyang Technological University Library, Singaore. Title Automatic Robot Taing: AutoPath Planning and Maniulation Author(s) Citation Yuan, Qilong; Lembono, Teguh
More informationInt. J. Advanced Networking and Applications Volume: 6 Issue: 4 Pages: 23862392 (2015) ISSN: 09750290
2386 Survey: Biological Insired Comuting in the Network Security V Venkata Ramana Associate Professor, Deartment of CSE, CBIT, Proddatur, Y.S.R (dist), A.P516360 Email: ramanacsecbit@gmail.com Y.Subba
More informationForensic Science International
Forensic Science International 214 (2012) 33 43 Contents lists available at ScienceDirect Forensic Science International jou r nal h o me age: w ww.els evier.co m/lo c ate/fo r sc iin t A robust detection
More informationTworesource stochastic capacity planning employing a Bayesian methodology
Journal of the Oerational Research Society (23) 54, 1198 128 r 23 Oerational Research Society Ltd. All rights reserved. 165682/3 $25. www.algravejournals.com/jors Tworesource stochastic caacity lanning
More informationComparing Dissimilarity Measures for Symbolic Data Analysis
Comaring Dissimilarity Measures for Symbolic Data Analysis Donato MALERBA, Floriana ESPOSITO, Vincenzo GIOVIALE and Valentina TAMMA Diartimento di Informatica, University of Bari Via Orabona 4 76 Bari,
More informationDynamic arrays and linked lists
Unit 11 Dynamic arrays and linked lists Summary Limitations of arrays Dynamic memory management Definitions of lists Tyical oerations on lists: iterative imlementation Tyical oerations on lists: recursive
More informationProvable Ownership of File in Deduplication Cloud Storage
1 Provable Ownershi of File in Dedulication Cloud Storage Chao Yang, Jian Ren and Jianfeng Ma School of CS, Xidian University Xi an, Shaanxi, 710071. Email: {chaoyang, jfma}@mail.xidian.edu.cn Deartment
More informationAssignment 9; Due Friday, March 17
Assignment 9; Due Friday, March 17 24.4b: A icture of this set is shown below. Note that the set only contains oints on the lines; internal oints are missing. Below are choices for U and V. Notice that
More informationSoftmax Model as Generalization upon Logistic Discrimination Suffers from Overfitting
Journal of Data Science 12(2014),563574 Softmax Model as Generalization uon Logistic Discrimination Suffers from Overfitting F. Mohammadi Basatini 1 and Rahim Chiniardaz 2 1 Deartment of Statistics, Shoushtar
More informationThe Graphical Method. Lecture 1
References: Anderson, Sweeney, Williams: An Introduction to Management Science  quantitative aroaches to decision maing 7 th ed Hamdy A Taha: Oerations Research, An Introduction 5 th ed Daellenbach, George,
More informationTheoretical and Experimental Study for Queueing System with Walking Distance
Theoretical and Exerimental Study for Queueing System with Walking Distance 371 22 1 Theoretical and Exerimental Study for Queueing System with Walking Distance Daichi Yanagisawa 1,2, Yushi Suma 1, Akiyasu
More information4 Perceptron Learning Rule
Percetron Learning Rule Objectives Objectives  Theory and Examles  Learning Rules  Percetron Architecture 3 SingleNeuron Percetron 5 MultileNeuron Percetron 8 Percetron Learning Rule 8 Test Problem
More informationThe MagnusDerek Game
The MagnusDerek Game Z. Nedev S. Muthukrishnan Abstract We introduce a new combinatorial game between two layers: Magnus and Derek. Initially, a token is laced at osition 0 on a round table with n ositions.
More informationThe impact of metadata implementation on webpage visibility in search engine results (Part II) q
Information Processing and Management 41 (2005) 691 715 www.elsevier.com/locate/inforoman The imact of metadata imlementation on webage visibility in search engine results (Part II) q Jin Zhang *, Alexandra
More informationCABRS CELLULAR AUTOMATON BASED MRI BRAIN SEGMENTATION
XI Conference "Medical Informatics & Technologies"  2006 Rafał Henryk KARTASZYŃSKI *, Paweł MIKOŁAJCZAK ** MRI brain segmentation, CT tissue segmentation, Cellular Automaton, image rocessing, medical
More informationFDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES
Document: MRM1004GAPCFR11 (0005) Page: 1 / 18 FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES AUDIT TRAIL ECO # Version Change Descrition MATRIX 449 A Ga Analysis after adding controlled documents
More informationMultiAircraft Routing and Traffic Flow Management under Uncertainty
MultiAircraft Routing and Traffic Flow Management under Uncertainty Arnab Nilim Laurent El Ghaoui Vu Duong Abstract 1 Introduction A major ortion of the delay in the Air Traffic Management Systems (ATMS)
More informationService Network Design with Asset Management: Formulations and Comparative Analyzes
Service Network Design with Asset Management: Formulations and Comarative Analyzes Jardar Andersen Teodor Gabriel Crainic Marielle Christiansen October 2007 CIRRELT200740 Service Network Design with
More informationMonitoring Frequency of Change By Li Qin
Monitoring Frequency of Change By Li Qin Abstract Control charts are widely used in rocess monitoring roblems. This aer gives a brief review of control charts for monitoring a roortion and some initial
More informationAn inventory control system for spare parts at a refinery: An empirical comparison of different reorder point methods
An inventory control system for sare arts at a refinery: An emirical comarison of different reorder oint methods Eric Porras a*, Rommert Dekker b a Instituto Tecnológico y de Estudios Sueriores de Monterrey,
More informationAn Efficient Method for Improving Backfill Job Scheduling Algorithm in Cluster Computing Systems
The International ournal of Soft Comuting and Software Engineering [SCSE], Vol., No., Secial Issue: The Proceeding of International Conference on Soft Comuting and Software Engineering 0 [SCSE ], San Francisco
More informationOn Software Piracy when Piracy is Costly
Deartment of Economics Working aer No. 0309 htt://nt.fas.nus.edu.sg/ecs/ub/w/w0309.df n Software iracy when iracy is Costly Sougata oddar August 003 Abstract: The ervasiveness of the illegal coying of
More informationENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS
ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS Liviu Grigore Comuter Science Deartment University of Illinois at Chicago Chicago, IL, 60607 lgrigore@cs.uic.edu Ugo Buy Comuter Science
More informationCBus Voltage Calculation
D E S I G N E R N O T E S CBus Voltage Calculation Designer note number: 3121256 Designer: Darren Snodgrass Contact Person: Darren Snodgrass Aroved: Date: Synosis: The guidelines used by installers
More informationA Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm
International Journal of Comuter Theory and Engineering, Vol. 7, No. 4, August 2015 A Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm Xin Lu and Zhuanzhuan Zhang Abstract This
More informationAlpha Channel Estimation in High Resolution Images and Image Sequences
In IEEE Comuter Society Conference on Comuter Vision and Pattern Recognition (CVPR 2001), Volume I, ages 1063 68, auai Hawaii, 11th 13th Dec 2001 Alha Channel Estimation in High Resolution Images and Image
More informationStorage Basics Architecting the Storage Supplemental Handout
Storage Basics Architecting the Storage Sulemental Handout INTRODUCTION With digital data growing at an exonential rate it has become a requirement for the modern business to store data and analyze it
More informationFailure Behavior Analysis for Reliable Distributed Embedded Systems
Failure Behavior Analysis for Reliable Distributed Embedded Systems Mario Tra, Bernd Schürmann, Torsten Tetteroo {tra schuerma tetteroo}@informatik.unikl.de Deartment of Comuter Science, University of
More informationDesign of A Knowledge Based Trouble Call System with Colored Petri Net Models
2005 IEEE/PES Transmission and Distribution Conference & Exhibition: Asia and Pacific Dalian, China Design of A Knowledge Based Trouble Call System with Colored Petri Net Models HuiJen Chuang, ChiaHung
More informationRisk in Revenue Management and Dynamic Pricing
OPERATIONS RESEARCH Vol. 56, No. 2, March Aril 2008,. 326 343 issn 0030364X eissn 15265463 08 5602 0326 informs doi 10.1287/ore.1070.0438 2008 INFORMS Risk in Revenue Management and Dynamic Pricing Yuri
More informationMachine Learning with Operational Costs
Journal of Machine Learning Research 14 (2013) 19892028 Submitted 12/11; Revised 8/12; Published 7/13 Machine Learning with Oerational Costs Theja Tulabandhula Deartment of Electrical Engineering and
More informationA Multivariate Statistical Analysis of Stock Trends. Abstract
A Multivariate Statistical Analysis of Stock Trends Aril Kerby Alma College Alma, MI James Lawrence Miami University Oxford, OH Abstract Is there a method to redict the stock market? What factors determine
More informationMultiperiod Portfolio Optimization with General Transaction Costs
Multieriod Portfolio Otimization with General Transaction Costs Victor DeMiguel Deartment of Management Science and Oerations, London Business School, London NW1 4SA, UK, avmiguel@london.edu Xiaoling Mei
More informationMultiChannel Opportunistic Routing in MultiHop Wireless Networks
MultiChannel Oortunistic Routing in MultiHo Wireless Networks ANATOLIJ ZUBOW, MATHIAS KURTH and JENSPETER REDLICH Humboldt University Berlin Unter den Linden 6, D99 Berlin, Germany (zubow kurth jr)@informatik.huberlin.de
More informationOptimizing Dynamic Memory Management!
Otimizing Dynamic Memory Management! 1 Goals of this Lecture! Hel you learn about:" Details of K&R hea manager" Hea manager otimizations related to Assignment #6" Other hea manager otimizations" 2 1 Part
More informationThe risk of using the Q heterogeneity estimator for software engineering experiments
Dieste, O., Fernández, E., GarcíaMartínez, R., Juristo, N. 11. The risk of using the Q heterogeneity estimator for software engineering exeriments. The risk of using the Q heterogeneity estimator for
More informationMaximizing the Area under the ROC Curve using Incremental Reduced Error Pruning
Maximizing the Area under the ROC Curve using Incremental Reduced Error Pruning Henrik Boström Det. of Comuter and Systems Sciences Stockholm University and Royal Institute of Technology Forum 100, 164
More informationThe fast Fourier transform method for the valuation of European style options inthemoney (ITM), atthemoney (ATM) and outofthemoney (OTM)
Comutational and Alied Mathematics Journal 15; 1(1: 16 Published online January, 15 (htt://www.aascit.org/ournal/cam he fast Fourier transform method for the valuation of Euroean style otions inthemoney
More informationLoglikelihood and Confidence Intervals
Stat 504, Lecture 3 Stat 504, Lecture 3 2 Review (contd.): Loglikelihood and Confidence Intervals The likelihood of the samle is the joint PDF (or PMF) L(θ) = f(x,.., x n; θ) = ny f(x i; θ) i= Review:
More informationMeasuring relative phase between two waveforms using an oscilloscope
Measuring relative hase between two waveforms using an oscilloscoe Overview There are a number of ways to measure the hase difference between two voltage waveforms using an oscilloscoe. This document covers
More informationBranchandPrice for Service Network Design with Asset Management Constraints
BranchandPrice for Servicee Network Design with Asset Management Constraints Jardar Andersen Roar Grønhaug Mariellee Christiansen Teodor Gabriel Crainic December 2007 CIRRELT200755 BranchandPrice
More informationGreeting Card Boxes. Preparation and Materials. Planning chart
Leader Overview ACTIVITY 13 Greeting Card Boxes In this activity, members of your grou make cool boxes out of old (or new) greeting cards or ostcards. But first, they ractice making boxes from grid aer,
More informationAn Associative Memory Readout in ESN for Neural Action Potential Detection
g An Associative Memory Readout in ESN for Neural Action Potential Detection Nicolas J. Dedual, Mustafa C. Ozturk, Justin C. Sanchez and José C. Princie Abstract This aer describes how Echo State Networks
More informationRisk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7
Chater 7 Risk and Return LEARNING OBJECTIVES After studying this chater you should be able to: e r t u i o a s d f understand how return and risk are defined and measured understand the concet of risk
More informationFrom Simulation to Experiment: A Case Study on Multiprocessor Task Scheduling
From to Exeriment: A Case Study on Multirocessor Task Scheduling Sascha Hunold CNRS / LIG Laboratory Grenoble, France sascha.hunold@imag.fr Henri Casanova Det. of Information and Comuter Sciences University
More informationDAYAHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON
DAYAHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON Rosario Esínola, Javier Contreras, Francisco J. Nogales and Antonio J. Conejo E.T.S. de Ingenieros Industriales, Universidad
More informationFinding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network
Finding a Needle in a Haystack: Pinointing Significant BGP Routing Changes in an IP Network Jian Wu, Zhuoqing Morley Mao University of Michigan Jennifer Rexford Princeton University Jia Wang AT&T Labs
More informationAnalysis of Algorithms I: Binary Search Trees
Analysis of Algorithms I: Binary Search Trees Xi Chen Columbia University Hash table: A data structure that maintains a subset of keys from a universe set U = {0, 1,..., p 1} and supports all three dictionary
More informationUnited Arab Emirates University College of Sciences Department of Mathematical Sciences HOMEWORK 1 SOLUTION. Section 10.1 Vectors in the Plane
United Arab Emirates University College of Sciences Deartment of Mathematical Sciences HOMEWORK 1 SOLUTION Section 10.1 Vectors in the Plane Calculus II for Engineering MATH 110 SECTION 0 CRN 510 :00 :00
More informationTimeCost TradeOffs in ResourceConstraint Project Scheduling Problems with Overlapping Modes
TimeCost TradeOffs in ResourceConstraint Proect Scheduling Problems with Overlaing Modes François Berthaut Robert Pellerin Nathalie Perrier Adnène Hai February 2011 CIRRELT201110 Bureaux de Montréal
More informationConcurrent Program Synthesis Based on Supervisory Control
010 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 30July 0, 010 ThB07.5 Concurrent Program Synthesis Based on Suervisory Control Marian V. Iordache and Panos J. Antsaklis Abstract
More informationProject Management and. Scheduling CHAPTER CONTENTS
6 Proect Management and Scheduling HAPTER ONTENTS 6.1 Introduction 6.2 Planning the Proect 6.3 Executing the Proect 6.7.1 Monitor 6.7.2 ontrol 6.7.3 losing 6.4 Proect Scheduling 6.5 ritical Path Method
More informationHigh Quality Offset Printing An Evolutionary Approach
High Quality Offset Printing An Evolutionary Aroach Ralf Joost Institute of Alied Microelectronics and omuter Engineering University of Rostock Rostock, 18051, Germany +49 381 498 7272 ralf.joost@unirostock.de
More informationA Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,
More informationTopology of the Prism Model for 3D Indoor Spatial Objects
Toology of the Prism Model for 3D Indoor Satial Objects JoonSeok Kim, HyeYoung Kang, TaeHoon Lee, KiJoune Li Deartment of Comuter Engineering Pusan National University joonseok@nu.edu, hyezero@nu.edu,
More informationBeyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis
Psychological Methods 004, Vol. 9, No., 164 18 Coyright 004 by the American Psychological Association 108989X/04/$1.00 DOI: 10.1037/108989X.9..164 Beyond the F Test: Effect Size Confidence Intervals
More informationMoving Objects Tracking in Video by Graph Cuts and Parameter Motion Model
International Journal of Comuter Alications (0975 8887) Moving Objects Tracking in Video by Grah Cuts and Parameter Motion Model Khalid Housni, Driss Mammass IRF SIC laboratory, Faculty of sciences Agadir
More informationFourier Analysis of Stochastic Processes
Fourier Analysis of Stochastic Processes. Time series Given a discrete time rocess ( n ) nz, with n :! R or n :! C 8n Z, we de ne time series a realization of the rocess, that is to say a series (x n )
More informationAn Introduction to Risk Parity Hossein Kazemi
An Introduction to Risk Parity Hossein Kazemi In the aftermath of the financial crisis, investors and asset allocators have started the usual ritual of rethinking the way they aroached asset allocation
More informationA 60,000 DIGIT PRIME NUMBER OF THE FORM x 2 + x Introduction StarkHeegner Theorem. Let d > 0 be a squarefree integer then Q( d) has
A 60,000 DIGIT PRIME NUMBER OF THE FORM x + x + 4. Introduction.. Euler s olynomial. Euler observed that f(x) = x + x + 4 takes on rime values for 0 x 39. Even after this oint f(x) takes on a high frequency
More informationPiracy and Network Externality An Analysis for the Monopolized Software Industry
Piracy and Network Externality An Analysis for the Monoolized Software Industry Ming Chung Chang Deartment of Economics and Graduate Institute of Industrial Economics mcchang@mgt.ncu.edu.tw Chiu Fen Lin
More informationComplex Conjugation and Polynomial Factorization
Comlex Conjugation and Polynomial Factorization Dave L. Renfro Summer 2004 Central Michigan University I. The Remainder Theorem Let P (x) be a olynomial with comlex coe cients 1 and r be a comlex number.
More informationAn Investigation on LTE Mobility Management
An Investigation on LTE Mobility Management RenHuang Liou, YiBing Lin, Fellow, IEEE, and ShangChih Tsai Deartment of Comuter Science National Chiao Tung University {rhliou, liny, tsaisc}@csnctuedutw
More information1 Gambler s Ruin Problem
Coyright c 2009 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins
More information