Querying about the Past, the Present, and the Future in SpatioTemporal Databases


 Randolph Morton
 1 years ago
 Views:
Transcription
1 Querying aout the Past, the Present, and the Future in SpatioTemporal Dataases Jimeng Sun Dimitris Papadias Yufei Tao Bin Liu Department of Computer Science Carnegie Mellon University Pittsurgh, PA, USA Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong {dimitris, Department of Computer Science City University of Hong Kong Tat Chee Avenue, Hong Kong Astract Moving ojects (e.g., vehicles in road networks) continuously generate large amounts of spatiotemporal information in the form of data streams. Efficient management of such streams is a challenging goal due to the highly dynamic nature of the data and the need for fast, online computations. In this paper we present a novel approach for approximate query processing aout the present, past, or the future in spatiotemporal dataases. In particular, we first propose an incrementally updateale, multidimensional histogram for presenttime queries. Second, we develop a general architecture for maintaining and querying historical data. Third, we implement a stochastic approach for predicting the results of queries that refer to the future. Finally, we experimentally prove the effectiveness and efficiency of our techniques using a realistic simulation.. Introduction Research on spatiotemporal access methods has mainly focused on two aspects: (i) storage and retrieval of historical information, (ii) future prediction. Several indexes [PJT, KGT+, TP], usually ased on multiversion or 3dimensional variations of Rtrees, have een proposed towards the first goal, aiming at minimization of the storage requirements and query cost. Methods for future prediction assume that, in addition to the current positions, the velocities of moving ojects are known. The goal is to retrieve the ojects that satisfy a spatial condition at a future timestamp (or interval) given their present motion vectors (e.g., "ased on the current information, find the cars that will e in the city center minutes from now"). The only practical index in this category is the TPRtree [SJLL] and its variations [SJ2, TPS3] (also ased on Rtrees). Other, mostly theoretical, predictive indexes have een proposed in [KGT99] (applicale only to D space) and [AAE] (with good asymptotical performance, ut inapplicale in practice due to the large hidden constants).. Motivation Despite the large numer of methods that focus explicitly on historical information retrieval or future prediction, currently there does not exist a single index that can achieve oth goals. Even if such a "universal" structure existed (e.g., a multiversion TPRtree keeping all previous history of each oject), it would e inapplicale for several updateintensive applications, where it is simply infeasile to continuously update the index and at the same time process queries. For instance, an update (i.e., deletion and reinsertion) in a TPRtree may need to access more than nodes, which means that y the time it terminates its result may already e outdated (due to another update of the same oject). Even for a small numer of moving ojects and a low update rate, the TPRtree (or any other index) cannot "follow" the fast changes of the underlying data. Another prolem concerns the space requirements. The incoming data are usually in the form of data streams (e.g., through sensors emedded on the road network), which are potentially unounded in size. Therefore, materializing all data is unrealistic. Furthermore, even if all the data were stored, the size of the tree would render query processing very expensive since any algorithm would have to access at least a complete path from the root to the leaf level. Finally, in several applications, such as traffic control systems, the main focus of query processing is retrieval of approximate summarized information aout ojects that satisfy some spatiotemporal predicate (e.g., " the numer of cars in the city center minutes from now"), as opposed to exact information aout the qualifying ojects (i.e., the car ids), which may e unavailale, or irrelevant..2 Contriutions Motivated y the aove oservations, in this paper we propose a comprehensive method that (i) can approximately answer aggregate spatiotemporal queries aout the past, the present and the future, (ii) can continuously follow the (typically, very frequent) changes in data distriution, (iii) it is efficient and adjustale to the
2 urden of the system, i.e., it does not strain the system, especially during periods of intensive workloads, and (iv) reduces the space consumption y summarizing the data. Our first contriution is an adaptive multidimensional histogram (AMH), which is used for approximate processing of presenttime queries. AMH utilizes idle CPU cycles to perform incremental maintenance. Its precision depends only on the availale system resources, and does not deteriorate with time (meaning the AMH does not need to e periodically reuilt). The second contriution is a general architecture, called the historical synopsis, for maintaining and querying historical data. In this architecture, when a histogram ucket is updated, the previous version is kept in a mainmemory index. If the availale memory is exceeded, part of the index containing the least recent information migrates to the disk. As a result, queries referring to the present and recent past (which are more frequent) can e answered without any I/O. Third we present a stochastic approach, ased on exponential smoothing, that answers queries aout the future using oservations from the present and the recent past. Unlike previous approaches for spatiotemporal prediction, this technique does not assume knowledge of velocity vectors, which in most cases is unavailale. Finally, we experimentally evaluate the proposed techniques under a reallife simulation, where updates and queries are interleaved with histogram maintenance operations. We study the effect of the system workload on the estimation accuracy, and verify the need for approximation methods that adapt to the availale recourses. The rest of the paper is organized as follows. Section 2 reviews previous work related to ours. Section 3 provides the prolem definition and an overview of the proposed methods. Section 4 presents the concrete algorithms of AMH, while Section descries the historical synopsis. Section 6 presents the stochastic approach for prediction, and Section 7 contains the experimental evaluation. Finally, Section 8 concludes the paper with directions for future work. 2. Related Work Because the proposed techniques comine multiple goals, they relate to a numer of different research topics. In particular, since the core of our method is AMH, in Section 2. we survey related work on multidimensional histograms. Section 2.2 presents methods for updating such histograms using query feedack. Section 2.3 discusses other multidimensional approximation techniques. Section 2.4 overviews spatiotemporal prediction techniques and their limitations in several applications. Finally, Section 2. presents methods for (exact) retrieval of spatiotemporal aggregation information. 2. Static multidimensional histograms Histogram construction techniques split the data space into uckets, usually ased on the assumption that data within a small region are (almost) uniform. The various methods differ on the partition policy. The equidepth histogram of [MD88] first partitions along one dimension so that each ucket has the same numer (i.e., frequency) of ojects. Then, these uckets are partitioned again along another dimension. Mhist [PI97] splits on the most critical dimension according to a partitioning metric (e.g., variance). Minskew [APR99] partitions the space into a regular grid. Neighoring cells are then grouped into rectangular, nonoverlapping uckets y a greedy algorithm that tries to minimize the spatialskew (i.e., the variance of the cell density in each ucket). Genhist [GKTD] allows overlapping uckets. As with Minskew, it starts with a regular grid and creates uckets for the cells with high density. Then it removes a percentage of data from these cells to make the area around them smoother. Finally it decreases the resolution and repeats the process recursively. The SQ [AN] and Euler histograms [SAA2], in addition to locations, take into account ojects extents, and are more accurate for nonpoint data. All the aove histograms apply to spatial or multidimensional dataases, where the data are (almost) static. When the data distriution changes due to updates, the ucket structure may ecome outdated, and the histogram must e reuilt. In several spatiotemporal applications, however, there is simply not enough time to scan the entire data (possily several times) in order to uild the histogram from scratch. Furthermore, the new histogram may already e outdated, due to the changes that occurred during its reuilding. 2.2 Queryadaptive multidimensional histograms Query adaptive histograms avoid reuilding y using query feedack to refine the uckets. STGrid [AC99], ased on the heuristic that uckets with higher frequencies contriute more to the estimation error, splits uckets with high frequencies and merges uckets with low frequencies. Restructuring is performed at certain intervals (a parameter of the histogram) for entire rows or columns, selected y a split or a merge threshold (also a parameter). Its structure, however, cannot accurately capture aritrary distriutions. STHoles [BGC] alleviates this prolem y allowing nesting of uckets. If a query identifies large frequency variance within a ucket, a "hole", i.e., a new ucket, is created inside the original one. This "drilling" operation replaces the splitting mechanism of STGrid. SASH [LWV3] decomposes the multidimensional space into suspaces of lower dimensionality. For each suspace a separate histogram is uilt and maintained using query feedack mechanisms.
3 Queryadaptive histograms are ased on the assumption that the frequency of queries is much higher than the frequency of updates. Thus, after the data distriution changes, there is a sufficient numer of queries to "train" (i.e., refine) the histogram. Although the accuracy for these queries will e low, susequent queries will e estimated using the refined histogram, which gradually ecomes very accurate. Clearly, this assumption does not hold for updateintensive applications, where the numer of queries for each instance of the data is, typically, very small. 2.3 Other multidimensional approximation methods Several multidimensional methods emed the data into another domain and summarize the emedded data in order to provide a compact data representation. The DCTased histogram [LKC99] maintains a large numer of small uckets using discrete cosine transform, which cannot e incrementally maintained. The histogram of [MVW98] extracts the most descriptive wavelet coefficients. Although it can e incrementally maintained [MVW], its performance degrades with the numer of updates, so that eventually it has to e reuilt. Another technique [TGIK2] compresses the data distriution into a multidimensional sketch; when the query arrives, the histogram is extracted from the sketch. The main drawack of this approach is that the extracting process is expensive and, therefore, not suitale for online queries. [GMP2] incrementally maintains a small sample set (acking sample) of underlying data and histograms are then constructed/maintained y the acking sample. However, it requires scanning the entire data to regenerate the acking sample when the distriution changes, which is not possile in our case. Furthermore, some other applications of samplingased techniques [GM98, WAA] are also questionale, since it is not clear how to select and effectively maintain a representative multidimensional sample in the presence of very intensive updates. Finally, all previous methods capture a snapshot of the data and cannot e used for historical information retrieval. 2.4 Spatiotemporal prediction methods Methods for spatiotemporal prediction estimate the numer of ojects that will satisfy some spatial condition during a future interval, ased on the current location and velocity information. Choi and Chung [CC2] and Tao et al. [TSP3] present proailistic models for uniform data, which are then applied to nonuniform distriutions using some conventional multidimensional histogram. An experimental comparison of several techniques can e found in [HKT3]. All existing methods assume linear movement and that the velocities of all ojects are known (the same assumptions that hold for the TPRtreeased indexes [SJLL, TPS3]). This restricts their applicaility in several applications ecause: (i) movement is not always linear, (ii) even if it is linear, it is not usually known (e.g., moile devices transmit only their location to the ase station), and (iii) even if it is linear and known, it changes so fast that prediction using velocity information is meaningless (e.g., cars moving on road networks). 2. Spatiotemporal aggregation methods Several methods (e.g., [PTKZ2, ZTG2]) focus on spatiotemporal aggregation. Although the target queries are similar to our historical queries, the context is different. In particular, they assume a "conventional" processing framework where every query invokes disk I/Os and returns an exact answer. Here we sacrifice accuracy for efficiency and attempt to answer the most frequent queries using only the main memory. 3. Prolem Definition and System Overview We assume a set of ojects moving in the twodimensional space and a central server collecting information aout their locations over time. Update data received y the server contain the previous and the new location of individual ojects, ut not their velocity or id. Moving ojects may issue updates in periodic intervals or ased on the deviation from the previous transmitted position. Static ojects do not send information to the server; until an oject transmits a new location it is assumed to e in the last recorded position. We adopt a 2D grid that partitions the data space into w w (where w is a constant called the resolution) regular cells with width /w on each axis. Each cell c ( c w 2 ) is associated with a frequency F c, which is the numer of ojects (at the present time) in its extent. The frequencies of all the cells are maintained uptodate in memory, and serve as the data of the highest granularity. The resolution is determined y the numer of moving ojects in the system, as well as the desirale tradeoff etween accuracy and efficiency. A spatiotemporal aggregation query q(q R,q T ) retrieves the numer of ojects that fall in a rectangle region q R at a timestamp q T. If (i) q T = (i.e., the present time), q constitutes a present timestamp (PT) query; (ii) q T <, q is a historical timestamp (HT) query; and (iii) q T >, q is a future timestamp (FT) query. When the resolution of the grid is high, exhaustive examination of all the cells that are relevant to a query is expensive. AMH (adaptive multidimensional histogram) remedies this prolem y grouping adjacent cells with similar frequencies into a small numer of (rectangular) uckets (similar to other histograms such as minskew [APR99] and genhist [GKTD]). Bucket information is updated as new tuples arrive, and ucket extents (i.e., grouping of cells) continuously adapt to the data distriution changes through incremental maintenance performed during the
4 idle CPU time (i.e., when there are no incoming data or queries). A inary partition tree accelerates ucket maintenance and processing of PT queries. To support historical retrieval, uckets that are outdated (due to data or extent changes) are inserted in a mainmemory index (T) optimized for HT queries. Each ucket is associated with a lifespan denoting the temporal interval during which the ucket information is valid. When the size of the index exceeds the availale memory, part of the tree containing the least recent uckets is migrated to the disk, such that queries aout the recent past can e answered using the memoryresident portion of the index. This migration occurs in locks (rather than individual uckets) at a time, and incurs minimal I/O overhead. AMH together with index T constitute the historical synopsis, whose general structure is shown in Figure 3.. Figure 3.: Historical synopsis The historical synopsis is directly used for answering PT and HT queries. On the other hand, FT queries are processed y an exponential smoothing technique, which retrieves information aout the present and the recent past, in order to predict the future. Figure 3.2 shows the astract query processing mechanism and Tale 3. lists the frequentlyused symols. In the rest of the paper we descrie the various components of the architecture, starting with AMH in Section Adaptive MultiDimensional Histogram Given a regular w w cell partition, AMH generates n rectangular uckets whose edges are aligned with the cell oundaries. For each ucket k ( k n), we denote (i) as n k the numer of cells it covers, (ii) as f k the average frequency of these cells (i.e., f k =(/n k )Σ ( cell in k) F c ), and (iii) as v k their variance (i.e., v k =(/n k )Σ ( cell in k) (F c f k ) 2 ). AMH aims at minimizing the weighted variance sum (WVS) of all the uckets, or formally: WVS= i=~n (n i v i ) (4) 4. AMH structure Each ucket stores: (i) its rectangular extent R (ausing the terminology slightly, we also use R to denote its area), (ii) the average frequency f of all the cells in R, and (iii) the average of squared frequency g of these cells: g=(/n k ) ( cell c in R) (F c ) 2. Thus, the numer n of cells covered y a ucket can e represented as R w 2, where w is the cell resolution. A ucket s variance v equals v=g f 2, and as a result, WVS can e computed using R, f, g of all uckets. AMH maintains a inary partition tree (BPT), where each leaf node corresponds to a ucket. An intermediate node is associated with a rectangular extent R that encloses the extents of its (two) children. Initially, BPT contains a single leaf node, and the histogram has one ucket covering the entire data space. New uckets (leaf nodes) are created through ucket splits (elaorated shortly), ut the total numer n of uckets never exceeds a system parameter B. As an example, Figure 4.a shows the frequencies of 2 cells (i.e., w=) at timestamp, Figure 4. demonstrates the extents of 6 uckets, and Figure 4.c illustrates the corresponding BPT. y y 4 y 3 y 2 y y y 4 y 3 y 2 y Symol w F c (B) n R k n k f k g k v k Figure 3.2: Query processing Description resolution frequency of cell c (maximum) numer of uckets ucket extent or area numer of cells in ucket k frequency (i.e., numer of ojects) in ucket k average squared frequency of ucket k frequency variance of ucket k Tale 3.: Frequent symols x x 2 x 3 x 4 x x x 2 x 3 x 4 x (a) Cell frequency () Bucket extents node R f g v 7 [ x x ][ y y ] /36 [ 8/ /6 9/6 x x 2 ][ y y 2 ] 2/4 2/ [ x 3 x 4 ][ y 4 y ] 3/4 43/4 3/6 4 [ x 3 x 4 ][ y 2 y 3 ] 39/4 383/4 / [ x x ][ y 2 y ] 2/4 /4 3/6 WVS=7.8 [ x 3 x ][ y y ] 3/3 3/3 3 4 (c) The BPT (d) Bucket information Figure 4.: AMH at time
5 Intermediate node 7, for instance, covers uckets, 2, implying that they were created y splitting 7. Figure 4.d lists the detailed information of each ucket (v is not explicitly stored, ut computed from f and g). 4.2 AMH information update A location update contains the old and the new position of a moving oject. When such an update is received, AMH executes an infoupdate process to modify the frequencies of the cells and uckets that cover the corresponding locations. Particularly, a cell can e identified using a hash function, while a ucket is located y following a single path of BPT, guided y the extents of the intermediate nodes. Assume, for example, that the frequency of cell (x,y 2 ) in Figure 4.a changes from 6 to at time 2, due to concurrent movement of several ojects. After modifying the cell frequency, infoupdate otains the frequency (squared frequency) difference 4 (64) etween the new and old values. To complete the update, it identifies ucket 6 covering the cell (following the path,, 9 ), and updates its f and g as f 6 = (f 6 n 6 +4)/n 6, g 6 = (g 6 n 6 +64)/n 6, where n 6 is the numer of cells covered y 6, and f 6 (g 6 ) is their average (square) frequency. Assuming that BPT is fairly alanced, the algorithm (shown in Figure 4.2) has average cost logb, where B is the maximum numer of uckets. Algorithm Infoupdate (x, y, d) /* (x,y) are the cell coordinates, d is the frequency difference */. find the cell c that contains (x, y) using a hash function 2. f=d ; g= (F c +d) 2  F c 2 ; F c =F c +d 3. descend BPT to find the ucket that contains (x,y) 4. f =(R w 2 f + f)/(r w 2 ); g =(R w 2 g + g)/(r w 2 ) // R is the extent area of, w is the cell resolution End Infoupdate Figure 4.2: infoupdate algorithm Updates may alter the frequency distriution, increasing the frequency variances and the approximation error. Figure 4.3a shows the updated cells at time 2, and Figure 4.3 illustrates the information of the corresponding uckets. Notice that v, v are significantly larger than those in Figure 4.d, causing considerale increase of WVS (from 7.8 in Figure 4.d to 73.8). To remedy this prolem, AMH performs ucket reorganization when the CPU is idle (i.e., no data/query is pending), which involves ucket merging and splitting. 4.3 AMH ucket merge A merge is performed when the numer of uckets has reached the maximum value B (assumed to e 6 in these examples), and aims at grouping cells of multiple uckets, whose frequencies have converged since the last reorganization, into a single ucket. Towards this, AMH selects the leaf node of BPT with the lowest merge penalty (MPen); MPen is defined as the increase of WVS (see equation 4) if the node is merged with its siling. Assume, for instance, the cell frequencies in Figure 4.3a, the ucket information in Figure 4.3, and the tree in Figure 4.c. Merging ucket 3 with its siling 4 removes oth nodes from BPT and makes 8 a leaf (i.e., a new ucket in the histogram). The change incurs n 8 v 8 (n 3 v 3 +n 4 v 4 ) increase in WVS (i.e., the merge penalty for 3, 4 ), where n i is the numer of cells covered y node i, and v i is its frequency variance. A ucket s siling can sometimes e an intermediate node, e.g., the siling of ucket 6 is 8, in which case MPen should e computed as n 9 v 9 (n 3 v 3 +n 4 v 4 +n 6 v 6 ). Notice that, merging 6 with 8 would reduce the ucket numer y 2 (i.e., removing 3 leaf nodes 3, 4, 6, spawning 9 ). The leaf node with the lowest merge penalty can e identified using a single postorder traversal of BPT s intermediate nodes (i.e., each intermediate node is processed after its children). Consider, for example, Figure 4.c, where node 7 is the first node processed. We compute the following information (from its children, 2 ): (i) the average frequency of all cells in its extent f 7 = (f R + f 2 R 2 )/ R 7, (ii) the average squared frequency g 7 = (g R + g 2 R 2 )/R 7, (iii) the merge penalty of its children MPen 7 = n 7 v 7 (n v +n 2 v 2 ), where n i =R i w 2, v i =g i f 2 i. Next, similar computations are performed for 8, 9,, (in this order). Note that i) the leaf node 3 ( 4, 6 ) is updated from the corresponding cells; ii) the computation for 9 ( ) is ased on the alreadycomputed information of intermediate node 8 ( 9 ). Similarly, the information of is computed y 7 and. Finally, the algorithm merges the children of the node that incurs the smallest penalty (in this case 9 ). Figures 4.4a and 4.4 illustrate the resulting ucket extents and BPT, respectively. Figure 4. shows the pseudocode of the ucketmerge algorithm. Its running time is B, since it visits all nodes of BPT. y y 4 y 3 y 2 y 9 node R f g v 3 9 [ x x 2 ][ y 3 y ] 32/6 284/6 68/ [ x x 2 ][ y y 2 ] 2/4 2/4 8/6 9 3 [ x 3 x 4 ][ y 4 y ] 39/4 38/4 3/ [ x 3 x 4 ][ y 2 y 3 ] 39/4 383/4 /6 2 [ x 3 x ][ y y ] 2/3 2/3 62/9 [ 6 6 x x ][ y 2 y ] 4/4 42/4 3/6 x x 2 x 3 x 4 x WVS=73.8 (a) Cell frequency () Bucket information Figure 4.3: AMH at time 2 y 4 9 y 3 7 y 2 y x x 2 x 3 x 4 x (a) Buckets after merging () BPT after merging Figure 4.4: Example of ucket merge (cont. Figure 4.)
6 Algorithm BucketMerge. minmpen= ; newleaf=null 2. let =the first intermediate node in the postorder traversal 3. do 4. if ( has at least one leaf node). compute f, g, MPen from its children 6. if (MPen <minmpen) 7. minmpen=mpen ; newleaf=; 8. =next intermediate node in the postorder traversal 9. until (=root). merge the children of newleaf End BucketMerge Figure 4.: ucketmerge algorithm 4.4 AMH ucket split A ucket split creates refined partitions in regions where uckets have large variance. It improves AMH s approximation accuracy y (i) reducing the ucket frequency variance, and (ii) decreasing the ucket extent. Towards this, the algorithm splits the ucket with the highest split enefit (SBen), i.e., the maximum decrease of WVS achieved y splitting this ucket at the est position. Consider, for example, ucket in Figure 4.4a. Since a split guarantees that the resulting uckets cover an integer numer of cells, there are three possile ways to split (one on the x, and two on the yaxes, respectively). The largest reduction of WVS is achieved y splitting on the xdimension, resulting in uckets 2, 3 (Figure 4.6a) and split enefit n v (n 2 v 2 +n 3 v 3 ). The split enefits of the other uckets are also computed and the one with the largest SBen (in this case ) is split (at its est split position). The ucketsplit algorithm repeats the split process until (i) new data/queries arrive, (ii) the numer of uckets reaches the maximum threshold B or (iii) there does not exist any ucket with positive split enefit, implying that the current ucket numer is sufficient to model the cell frequency distriution. Continuing the example, ucketsplit computes SBen for the new uckets ( 2, 3 ) and splits (whose enefit is currently the largest) into 4,. Since the ucket numer has reached B (=6), the algorithm terminates. Figures 4.6a, 4.6 illustrate the final ucket extents and BPT, respectively. y y 4 y 3 y 2 y PT query q x x 2 x 3 x 4 x (a) Final Buckets () BPT after splitting Figure 4.6: After splitting, (cont. Figure 4.4) Figure 4.7 summarizes the ucketsplit algorithm. Each ucket reorganization involves at most one ucket merging (i.e., if the numer of uckets equals B), and a small numer of ucket splits. After its termination, the process starts again if the CPU is still idle. Compared to traditional histograms requiring expensive total reuilding, AMH deploys repetitive, simple, interruptile, partial reorganizations, and, therefore, it is ideal for dynamic spatiotemporal environments, where updates/queries may occur in (unpredictale) ursts. Algorithm BucketSplit. for every ucket 2. compute its split enefit and est split position 3. sort all split enefits in descending order into a list L s 4. while (ucket numer n <B and no new data/query and the first ucket in L s has positive split enefit). remove the first ucket from L s 6. split it into, 2 7. otain split enefits and est split positions for, 2 8. insert, 2 into the appropriate positions in L s 9. return End BucketSplit Figure 4.7: ucketsplit algorithm 4. AMH PT query processing A presenttime query is answered y examining all uckets whose extents intersect the query region q R. Such uckets are located y traversing BPT and following the nodes that overlap q R. For example, the query q in Figure 4.6a intersects only ucket 9, which can e reached y following the intermediate nodes,. For each intersecting ucket k, we compute the overlap area R intr etween its extent and q R, and otain a partial result f k R intr w 2 (note that R intr w 2 equals the numer of cells in k that are covered y q R ). Then, the final result is computed as the sum of the partial results of all intersecting uckets, without accessing the underlying cells.. Historical Synopsis In this section, we discuss HT query processing using the historical synopsis that consists of (i) a dynamic histogram maintaining the currently valid uckets (e.g., AMH), and (ii) a past index T storing the osolete uckets.. General architecture A ucket in AMH dies if (i) the frequency in any of its cells is updated, or (ii) its extent changes due to a split or merge. In case (i), a new ucket with the same extent, ut different frequency statistics (i.e., f and g) is created. In case (ii) new ucket(s) are created due to merge or split. Each ucket contains a lifespan [l s, l e ), where l s (l e ) is the time of creation (death). All the uckets in AMH are alive, and their l e equals now. Dead uckets are inserted into index T and their cell content is discarded. In Figure 4., for example, all the uckets o,...,o 6 are alive and have lifespans [,now). Then, in Figures 4.3 uckets, 3,, 6 die at time 2 ecause some cells in their extents change frequencies at this timestamp. This
7 produces four dead uckets with lifespans [,2) (which are inserted into T), and creates four live ones with lifespans [2,now). At the next timestamp 3, uckets, 3, 4,, 6 die (yielding four uckets with lifespans [2,3], and one 4 with lifespan [,3)) since 3, 4, 6 are merged (Figure 4.4) and, are split (Figure 4.6). Buckets 9, 2, 3, 4, that result from these operations are alive with lifespans [3, now). As shown in Figure., after these operations, T contains 9 dead uckets and AMH 6 live ones. dead stored in T alive stored in AMS node R f g v lifespan [ x x 2 ][ y 3 y ] 7/6 9/6 /36 [,2) 3 [ x 3 x 4 ][ y 4 y ] 3/4 43/4 3/6 [,2) [ x 3 x ][ y y ] 3/3 3/3 [,2) 6 [ x x ][ y 2 y ] 2/4 /4 3/6 [,2) 3 [ x 3 x 4 ][ y 4 y ] 39/4 38/4 3/6 [2,3) 4 [ x 3 x 4 ][ y 2 y 3 ] 39/4 383/4 /6 [,3) 6 [ x x ][ y 2 y ] 4/4 42/4 3/6 [2,3) [ x x 2 ][ y 3 y ] 32/6 284/6 68/36 [2,3) [ x 3 x ][ y y ] 2/3 2/3 62/9 [2,3) 2 [ x x 2 ][ y y 2 ] 2/4 2/4 8/6 [,now) * o 9 [ x 3 x ][ y 3 y ] /2 [3,now) 2 [ x x ][ y 3 y ] 3/3 3/3 [3,now) x 3 [ 2 x 2 ][ y 3 y ] 29/3 28/3 2/3 [3,now) [ x [3,now) 4 3 x 4 ][ y y ] 2/2 2/2 x [ x ][ y y ] / / [3,now) Figure.: Buckets at time 3 (cf. Figures 4., 4.3, 4.6) The size of T grows continuously with time and eventually exceeds the availale memory, in which case part of it migrates to the disk. The migration is performed in locks, meaning that we simply move pages of equal sizes from the memory to the disk. Further, the disk accesses are sequential except for, possily, the first page transferred. Accordingly, pointers to these pages are modified to the corresponding disk addresses (from their original mainmemory addresses). Each page of T that is currently in memory is assigned a migration rank, such that pages are migrated in ascending order of their ranks. A separate mainmemory structure (e.g., priority queue) can e maintained to select the page with the lowest rank efficiently. In our implementation, the migration rank of a page is set to the latest ending timestamp l e of all its entries lifespans. In general, pages with low ranks should contain old uckets, and e less likely accessed y queries. Finally, pages that are already on the disk may e eventually (physically) erased or moved to tertiary storage, when they ecome completely osolete..2 Implementation using a packed Btree The past index T should support the following operations efficiently: (i) insertion of a dead ucket, and (ii) selection of uckets intersecting a HT query. Our first implementation adopts a packed Btree, which indexes the ending time l e of the uckets lifespans. Specifically, each leaf entry stores all information of a ucket (i.e., R, f, g, and its life span), while an intermediate entry stores a lifespan [l s, l e ), which encloses those of the leaf entries in * * its child node. All the nodes in the Btree are full, except for the rightmost one at each level, which does not have a minimum utilization requirement. Particularly, we refer to the rightmost leaf node as the active leaf, for which a special pointer is maintained. Since uckets are inserted into T in ascending order of their l e, each insertion simply appends the new ucket into the active leaf, and terminates if the leaf incurs no overflow. Otherwise, a new active leaf is created with the most recently inserted ucket, and this change propagates to the parent levels. The amortized insertion cost is constant. Figure.2 illustrates the corresponding packed Btree for the dead uckets in Figure., assuming a capacity of 3 entries per node. D [,2) [,3) [2,3) [,2) 3 [,2) [,2) 6 [,2) 3 [2,3) 4 [,3) 6[2,3) [2,3) [2,3) A B C lifespan active leaf Figure.2: The packed Btree In some cases, HT may have to access AMH, if there are some uckets that were created at the query timestamp, ut still remain alive. Consider, for example, the query with region q R =[x 2,x 3 ][y 2,y 3 ] and timestamp q T = on the uckets of Figure. (the corresponding AMH is shown in Figure 4.6). The query must visit all the uckets whose extents (lifespans) intersect q R at time, namely, those marked with * in Figure.. The final result is the sum of the partial results of all such uckets. The partial result from ucket 2 (which is still alive in AMH), is otained y a PT query as discussed in Section 4.. To locate the intersecting uckets (i.e., and 4 ) in the Btree, the algorithm accesses the nodes whose lifespans contain q T (=), i.e., nodes D, A, B in Figure.2. At the leaf level the partial results of qualifying uckets are otained in the same way as PT queries (i.e., y computing the size of the intersection area with q R ). The packed Btree implementation contains a simple and very efficient insertion algorithm, ut processes HT queries using only temporal pruning. In the next section, we descrie an alternative structure that takes advantage of spatial conditions to accelerate query processing..3 Implementation using a 3D Rtree Our second implementation is ased on a mainmemory adaptation of the 3D R*tree [BKSS9]. In this structure, each intermediate entry contains a 3D ox which encloses the extents and lifespans of all the uckets in its sutree. Given a HT q (which can also e regarded as a 2D rectangle defined y q R at time q T ), we only need to visit those nodes whose 3D oxes intersect that of q. Compared to the packed Btree, the 3D Rtree implementation achieves lower query cost since it utilizes oth spatial and temporal conditions to prune the search space. On the other hand, it incurs higher update overhead
8 due to the complex insertion operations of the Rtree (which are required to achieve spatiotemporal locality). Both indexes use the same migration mechanism discussed in Section., i.e., when the main memory capacity is reached, the least recent locks of the index are transferred to the disk. Thus, their update cost difference refers to CPU overhead. In summary, the B tree method is preferale for very high streaming rates, while the Rtree for applications with heavy query workload. The tradeoff etween update and query efficiency is explored in the experimental evaluation. 6. Prediction Model As discussed in Section 2.4, spatiotemporal prediction ased on current locations and velocities has limited applicaility in the majority of real applications. Consider, for instance, a traffic supervision scenario, where cars moving on a road network periodically transmit their information. The assumption that the velocity will remain constant etween the current and the query time q T is not realistic, since the cars that reach the end of the road segment on which they travel, must update their velocity (i.e., they must turn or stop). As shown in the experimental evaluation, for typical traffic simulation settings, up to 9% of the cars may update their velocities per timestamp, implying that velocityased prediction is meaningless even for the next timestamp. The motivation of our model is that, although the individual oject velocities may change aruptly, the overall data distriution varies gradually (and slowly) with time, due to the continuity of movement. Furthermore, since we keep past and present data, we can use this information to identify the trend of movement and estimate the result of a future query. Consider, for instance, that the current time is, and we want to process the query q(q R,), that predicts the numer of ojects in q R at the next timestamp. The output of q can e represented as a function of the result of a PT and a set of recent HT queries with the same region q R. Exponential smoothing, a wellknown forecasting method in time series analysis [G8], is ased on the same reasoning. According to this method, the estimated result S' of the query (q R,) can e modeled as a time series such that: S' = αs o + α( α)s  + α( α) 2 S α( α) n S n where (i) S is the actual result of the PT query (q R,) (at the current time), (i) S i is the actual result of the HT query (q R,i) at the i th previous timestamp, (iii) n determines the length of previous history that will e used for prediction, and (iv) a is the smoothing parameter in the range (,). The formula is ased on the idea that recent timestamps are more important for prediction than older ones. The relative weight is adjusted y the value of a: as it approaches, the significance of the most recent timestamps increases. The model is applicale in our case ecause the changes of the query result are smooth, or more specifically, the series has a locally constant mean which shows some drift over time. If we want to predict the result at a future timestamp i>, (i.e., no actual values for S to S i are availale), we have to use a stepystep technique, i.e., first compute S', then use this value to estimate S' 2 and so on. In general, the value of S' i can e represented y the following recurrence: S' i = αs o + (α)s' i where S' i is the predicted value for the query at future timestamp i. Figure 6. illustrates the pseudocode for the prediction algorithm ased on the aove discussion. In our implementation we fix a to.2 and n to 6. Although, these parameters can e set onthefly for each individual query (i.e., the values that give the est estimation for the current and recent timestamps, since the actual results are known), we chose to fix gloal values for all queries in order to avoid the overhead of tuning. In particular, it has een suggested [H86] that a =.2~.3 gives accurate results for a variety of prolems, while n = 6 represents a good tradeoff etween accuracy and query cost (the longer we go into the past the higher the cost). Algorithm Prediction (qr,qt) /* qr: query region; qt: prediction timestamp; n: the least recent timestamp taken into account for prediction, α : smoothing parameter */. S =PT(qr,) // compute result of corresponding PT query 2. for i= to n 3. S i = HT(qr,i) // result of HT at i th previous timestamp 4. t = n; S current = S n ;. while (t qt) 6. if (t ) // present or past 7. S next = α S t + ( α) S current 8. else // future 9. S next = α S + ( α) S current. S current = S next. t = t+ 2. return S current // predicted value at qt End Prediction Figure 6.: Algorithm for prediction 7. Experimental Evaluation This section evaluates the proposed methods, using a Pentium IV.8GHz CPU and 26 MBytes of main memory. Section 7. descries the experimental settings, Section 7.2 tunes the size of AMH and Section 7.3 evaluates its aility to follow the data distriution effectively. Section 7.4 compares AMH with the conventional, total reuilding approach. Section 7. evaluates the approximation error and compares exponential smoothing with velocityased prediction. Finally, Section 7.6 studies the effect of update intensity on the performance of the historical synopsis and the approximation error.
9 7. Settings The first dataset, denoted as spatial, simulates a scenario involving k moile ojects. The initial positions of the ojects are sampled from the real dataset MG, and their destinations are randomly selected from another real dataset LB (oth availale from www/tiger/). Each oject moves from the initial to the destination point following a straight line and reports a fixed numer of location changes to the server. After it reaches its destination, it selects a new random destination from the initial dataset and this process is repeated for a total of "round trips". Although the endpoints of each trip are different, the overall distriution after the completion of a trip is either MG or LB. The second dataset, denoted as road, is created y the spatiotemporal generator of [B2]. The input of the generator is the road map of Oldenurg (a city in Germany), containing 6 nodes and 73 edges, and the output is a set of point ojects moving on this network. Each point is represented y its locations at successive timestamps. The datasets are created y setting the parameters (see [B2] for their detailed semantics) of the generator as follows. There are 6 classes of ojects (e.g., cars, pedestrians), so that each class moves with a particular speed range. The total numer of ojects (in all 6 classes) at any timestamp is, (the generator decides the numer of ojects per class according to some function eyond users control). At each timestamp 2k ojects disappear (i.e., they reach their destinations), while 2k new ones appear at random nodes. Given these settings, on average, 9% of the ojects issue updates per timestamp (i.e., they reach the end of the road segment that they are on and they turn or stop). The total numer of updates in oth road and spatial datasets is 2.M. The server maintains a regular partition of the space in cells (as discussed in Sections 3 and 4). When a location update is received from an oject, the server (i) decreases (y ) the frequency of the cell corresponding to the old position, and (ii) increases the frequency of the cell at the new location. The first column of Figure 7. shows the density of ojects per cell at the initial, median and ending timestamp in a single trip, for the spatial dataset. The second column illustrates the corresponding ucket structure, which continuously follows the data distriution. Figure 7.2a shows the road network of Oldenurg and 7.2 the density of the cells at the initial timestamp. Although the velocity updates for road are much more frequent than the spatial dataset, the overall distriution does not change significantly over time ecause movement is constrained to the network. Each query has two parameters: the side length q L and the query timestamp q T. In all cases, the query extent is a square with side length q L, uniformly distriuted in space. For HT (FT), queries q T is uniformly distriuted in the past (future) timestamps. An FT query is processed y issuing 6 HT and PT query with the same extent. The error rate is measured as: act app / act, where act (app) equals the actual (approximate) answer. (a) Initial data (MG) (c) Median data () Initial histogram (d) Median histogram (e) Final data (LB) (f) Final histogram Figure 7.: Spatial dataset (a) Road Network () Data on road Figure 7.2: Road dataset 7.2 Setting the numer of uckets Unlike other multidimensional histograms (e.g., genhist) that require tuning for several parameters, AMH involves a single parameter B (maximum numer of uckets). In order to tune B, we issue 2k PT queries, distriuted in the whole history with length q L =6% of the spatial universe. We allow AMH to reorganize every incoming tuples. This reorganization is limited to five
10 merge operations, each followed y one or more splits (depending on whether intermediate BPT nodes are merged). As expected (and shown in Figure 7.3a), the average error rate decreases with B. The error is higher for road ecause the local uniformity assumption (within each ucket) does not accurately capture the real oject distriution. The same prolem exists for all multidimensional histograms (see Section 2) that decompose the space ased on local uniformity. To the est of our knowledge, there does not exist any histogram specifically targeting network data. 4% error rate 3% % spatial ucket numer road 2 CPU time µ sec 2 ucket numer (a) Error rate vs. B () Reorganization cost vs. B Figure 7.3: Tuning B (q L =6%, q T =) In order to evaluate the overhead, we use the same settings and measure the total CPUtime spent on reorganization operations. Figure 7.3 shows the average CPU cost per location update as a function of B for the spatial dataset (the CPU cost for road is aout the same and omitted). The overhead increases linearly with the maximum numer of uckets, since oth merge and split operations examine all uckets. Based on these experiments, we set B=, since it provides good accuracy (average error less than % and, for spatial and road, respectively) with reasonale overhead. 7.3 Roustness with time The goal of this experiment is to evaluate the roustness of AMH with time. We fix B=, apply the same settings as in Figure 7.3, and issue 2k PT queries, uniformly distriuted in history (q L =6%). Figure 7.4 shows the error rate as a function of the total numer of updates for the spatial dataset. The periodic ehaviour of the error is ecause the initial and final distriutions are more skewed than the intermediate ones (see Figure 7.). 6% error rate 2% 8% 4% k numer of location updates.m M.M 2M 2.M Figure 7.4: Error vs. numer of updates (spatial) Figure 7. repeats the experiment for the road dataset, which does not show the periodic effect of spatial, ecause the data distriution is constrained y the underlying road network. In oth cases the overall error rate remains stale as time evolves, indicating that AMH captures the dynamic data distriution very well through successive reorganizations. 3% k numer of location updates.m M.M 2M 2.M Figure 7.: Error vs. numer of updates (road) 7.4 Comparison with conventional histograms Now, we compare AMH with minskew [APR99] (a common enchmark in the histogram literature [GKTD, WAA]), constructed using the same regular partition as AMH. Since minskew is a static histogram, we reuild it every k location updates, measure the cost (aout. seconds) for each reuilding and assign a percentage tp of this cost to reorganization operations in AMH. For instance, if tp=%, we use the same amount of time for oth AMH reorganizations and minskew reuilding. The difference is that the reorganization operations of AMH are uniformly distriuted among the k location updates. Figure 7.6a () illustrates the average error rate of spatial and road datasets over 2k uniformly distriuted (in history) PT queries as a function of tp. AMH is much more accurate than minskew, even when consuming / of the CPU resources, ecause it continuously adapts to the data distriution using cheap operations. On the other hand, minskew is accurate only during the short period after the reuilding and its ucket structure soon ecomes outdated. 3% error rate AMH minskew time proportion... 3% error rate 2% % time proportion... (a) spatial () road Figure 7.6: Error rate vs. tp (q L = 6%, q T =, B=) 7. Evaluation of query and prediction error In order to evaluate the error of different query types, we issue 3 query sets, each containing 2k queries of the same type (HT, PT, or FT). Figure 7.7 shows the error rate as the function of the query length q L (from 2% to of the spatial universe), respectively. The error of FT queries (average of predictions for the next 
11 timestamps) is higher ecause it also includes the inaccuracy of prediction. The results for the road dataset are less accurate, ecause (as discussed in Figure 7.3) the local uniformity assumption within each ucket does not capture well the actual data distriution. error rate % % % HT PT FT 3% error rate 3% 2% % % % 2% 4% 6% 8% 2% 4% 6% 8% ql ql (a) spatial () road Figure 7.7: Query error comparison (B=) Next, we compare our prediction method, ES for exponential smoothing, with the stateoftheart model [TSP3] that uses only information aout current location and velocity. In this model, nonuniform data are handled y a 4D minskew histogram (2D for spatial, 2D for velocity). The velocity of an oject at timestamp i is determined y its location at timestamps i and i+. The histogram is constructed using and 6 6 regular partitions on the spatial and velocity dimensions, respectively. The numer of uckets is set to 2 (more uckets are needed due to the 4D space) and reuilding occurs every timestamp (so that the error is due to the model and not the histogram deterioration). In other words, oth the space consumption and the CPU overhead are much higher than AMH. Figure 7.8 shows the error rate for 2k uniformly distriuted FT queries as a function of q T (i.e., future prediction time), fixing q L to 6% of the spatial universe. Although in spatial the movement is linear, velocityased prediction is less accurate than ES (Figure 7.8a). Compared to the results reported in [TSP3], the error of the method is higher, ecause it now includes the updates that occur etween the current and the query time (in [TSP3] it is assumed that there are no such updates). For the road dataset (Figure 7.8), where updates are much more intense, the quality of prediction deteriorates fast with q T, confirming our oservations (in Section 6) that velocityased prediction is meaningless in such cases. 3% error rate 2% % % ES Velocityased prediction prediction timestamp % error rate 8% 6% 4% prediction timestamp (a) spatial () road Figure 7.8: Error rate vs. q T (q L = 6%) 7.6 The effect of update intensity Having demonstrated the accuracy of the proposed methods, we now test the historical synopsis architecture in realtime environments involving intensive updates. In this scenario we assume that updates have higher priority and, therefore, reorganization is delayed during periods of high workload. In order to evaluate the ehaviour of the two implementations (packed Btree and 3D Rtree), we issue 2k HT, PT and FT queries uniformly distriuted in history. Figure 7.9 shows the error as a function of the update rate, which varies from k to k updates per second. For k updates the accuracy of oth implementations is the same, implying that they can oth follow the data distriution effectively. In the other cases, however, the 3D Rtree incurs higher error, ecause it consumes a significant amount of time for insertion operations (of the dead uckets in the tree). On the other hand, the efficient insertion algorithms of the packed B tree, allow the system to devote more CPU time to histogram reorganization, leading to lower error. Note that the error rate of spatial increases faster than that of the road dataset, ecause the data distriution varies significantly with time, implying that if the uckets do not get reorganized, they will soon ecome outdated. However, if a large percentage of the workload consists of queries, then the 3D Rtree synopsis is preferale, since it utilizes the spatial condition to effectively prune the search space. Figure 7. shows the average cost of all query types for the two implementations. In oth cases, PT queries are processed y AMH and have the same performance. HT (and, consequently, FT) queries incur almost half the cost in Rtrees. For typical settings, we found that if the query/update ratio in the workload exceeds 3/7, the 3D Rtree is etter (i.e., it allows for more reorganization operations). 2% error rate % % % Packed Btree 3D Rtree 2% error rate % % % k k k k k k update rate update rate (a) spatial () road Figure 7.9: Error rate vs. update rate (q L = 6%, B=) CPU time msec PT HT FT Figure 7.: Query cost comparison (q L = 6%, B =)
12 8. Conclusions Existing spatiotemporal access methods have several disadvantages that severely limit their applicaility: (i) they focus explicitly on either historical information retrieval, or future prediction; (ii) they are very expensive to update and query online, since oth types of operations incur a large numer of disk accesses; (iii) their prediction assumptions are unrealistic for most practical scenarios. Motivated y these prolems, we present a comprehensive approach for processing queries that refer to any time in history. Instead of keeping detailed information aout individual ojects, the proposed architecture maintains an incremental multidimensional histogram, which answers presenttime queries. Outdated uckets are stored in a mainmemory index, in order to answer queries aout the recent past without any I/O operations. The "oldest" parts of the index migrate to the disk in locks, so that the total I/O cost of updates is minimized. Finally, future queries are answered y a stochastic method that uses the recent history to predict the future, without any assumptions aout velocity. Extensive experiments confirm the effectiveness of our techniques under realistic settings. Acknowledgements This work was supported y grants HKUST 68/E and HKUST 697/2E from Hong Kong RGC. References [AAE] Agarwal, P., Arge, L., Erickson, J. Indexing Moving Points. PODS, 2. [AC99] Aoulnaga, A., Chaudhuri, S. Selftuning Histograms: Building Histograms Without Looking [AN] at Data. SIGMOD, 999. Aoulnaga, A., Naughton, J. Accurate Estimation of the Cost of Spatial Selections. ICDE, 2. [APR99] Acharya, S., Poosala, V., Ramaswamy, S. Selectivity Estimation in Spatial Dataases. SIGMOD, 999. [B2] Brinkhoff, T. A Framework for Generating NetworkBased Moving Ojects. GeoInformatica 6(2), 38, 22. [BKSS9] Beckmann, N., Kriegel, H., Schneider, R., Seeger, B. The R*tree: An Efficient and Roust Access Method for Points and Rectangles. SIGMOD, 99. [BGC] Bruno, N., Chaudhuri, S., Gravano, L. STHoles: A Multidimensional WorkloadAware Histogram. SIGMOD, 2. [CC2] [G8] Choi, Y., Chung, C. Selectivity Estimation for SpatioTemporal Queries to Moving Ojects. SIGMOD, 22. Gardner, E. Exponential Smoothing: The State of the Art. Journal of Forecasting (4): 28, 98. [GKTD] Gunopulos, D., Kollios, G., Tsotras, V., Domeniconi, C. Approximating MultiDimensional Aggregate Range Queries over Real Attriutes. SIGMOD, 2. [GM98] Gions, P., Matias, Y. New SamplingBased Summary Statistics for Improving Approximate Query Answers. SIGMOD, 998. [GMP2] Gions, P., Matias, Y., Poosala, V. Fast Incremental Maintenance of Approximate [H86] Histograms. TODS 27(3):26298, 22. Hunter, J. The Exponentially Weighted Moving Average, Journal of Quality Technology, 8(4), 2327, 986. [HKT3] Hadjieleftheriou, M., Kollios, G., Tsotras, V. Performance Evaluation of Spatiotemporal Selectivity Estimation Techniques. SSDBM, 23. [HKTG2] Hadjieleftheriou, M., Kollios, G., Tsotras, V., Gunopulos, D. Efficient Indexing of Spatiotemporal Ojects. EDBT, 22. [KGT99] Kollios, G., Gunopulos, D., Tsotras, V. On Indexing Moile Ojects. PODS, 999. [LKC99] Lee, J., Kim, D., Chung, C.. Multidimensional Selectivity Estimation Using Compressed Histogram Information. SIGMOD, 999. [LWV3] Lim, L., Wang, M., Vitter, J. SASH: A Self Adaptive Histogram Set for Dynamically Changing Workloads. VLDB, 23. [MD88] Muralikrishna, M., DeWitt, D. EquiDepth Histograms For Estimating Selectivity Factors For MultiDimensional Queries. SIGMOD, 988. [MVW] Matias,Y.,Vitter,J,Wang,M., Dynamic Maintenance of WaveletBased Histograms. VLDB, 2. [MVW98] Matias, Y., Vitter, J., Wang, M. WaveletBased Histograms for Selectivity Estimation. SIGMOD, 998. [PI97] Poosala, V., Ioannidis, Y. Selectivity Estimation Without the Attriute Value Independence Assumption. VLDB, 997. [PJT] Pfoser, D., Jensen, C., Theodoridis, Y. Novel Approaches in Query Processing for Moving Oject Trajectories. VLDB, 2. [PTKZ2] Papadias, D., Tao, Y., Kalnis, P., Zhang, J. Indexing [SAA2] [SJ2] SpatioTemporal Data Warehouses. ICDE, 22. Sun, C., Agrawal, D., Aadi, A. Exploring Spatial Datasets with Histograms. ICDE, 22. Saltenis, S., Jensen, C. Indexing of Moving Ojects for LocationBased Services. ICDE, 22. [SJLL] Saltenis, S., Jensen, C., Leutenegger, S., Lopez, M. Indexing the Positions of Continuously Moving Ojects. SIGMOD, 2. [TGIK2] Thaper, N., Guha, S., Indyk, P., Koudas, N. Dynamic Multidimensional Histograms. SIGMOD, 22. [TP] [TPS3] [TSP3] Tao, Y., Papadias, D. The MV3RTree: A Spatio Temporal Access Method for Timestamp and Interval Queries. VLDB, 2. Tao, Y., Papadias, D., Sun, J. The TPR*Tree: An Optimized SpatioTemporal Access Method. VLDB, 23. Tao, Y., Sun, J. Papadias, D. Selectivity Estimation for Predictive SpatioTemporal Queries. ICDE, 23. [WAA] Wu, Y., Agrawal, D., Aadi, A. Using the Golden Rule of Sampling for Query Estimation. SIGMOD, 2. [ZTG2] Zhang, D., Tsotras, V., Gunopulos, D. Efficient Aggregation over Ojects with Extents.PODS, 22.
Rtrees. RTrees: A Dynamic Index Structure For Spatial Searching. RTree. Invariants
RTrees: A Dynamic Index Structure For Spatial Searching A. Guttman Rtrees Generalization of B+trees to higher dimensions Diskbased index structure Occupancy guarantee Multiple search paths Insertions
More informationIndexing and Retrieval of Historical Aggregate Information about Moving Objects
Indexing and Retrieval of Historical Aggregate Information about Moving Objects Dimitris Papadias, Yufei Tao, Jun Zhang, Nikos Mamoulis, Qiongmao Shen, and Jimeng Sun Department of Computer Science Hong
More informationIndexing the Trajectories of Moving Objects in Networks
Indexing the Trajectories of Moving Objects in Networks Victor Teixeira de Almeida Ralf Hartmut Güting Praktische Informatik IV Fernuniversität Hagen, D5884 Hagen, Germany {victor.almeida, rhg}@fernunihagen.de
More informationData Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing GridFiles Kdtrees Ulf Leser: Data
More informationIndexing Spatiotemporal Archives
The VLDB Journal manuscript No. (will be inserted by the editor) Indexing Spatiotemporal Archives Marios Hadjieleftheriou 1, George Kollios 2, Vassilis J. Tsotras 1, Dimitrios Gunopulos 1 1 Computer Science
More informationDistributed Continuous Range Query Processing on Moving Objects
Distributed Continuous Range Query Processing on Moving Objects Haojun Wang, Roger Zimmermann, and WeiShinn Ku University of Southern California, Computer Science Department, Los Angeles, CA, USA {haojunwa,
More informationData Structures for Moving Objects
Data Structures for Moving Objects Pankaj K. Agarwal Department of Computer Science Duke University Geometric Data Structures S: Set of geometric objects Points, segments, polygons Ask several queries
More informationSurvey On: Nearest Neighbour Search With Keywords In Spatial Databases
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&
More informationSampling Methods In Approximate Query Answering Systems
Sampling Methods In Approximate Query Answering Systems Gautam Das Department of Computer Science and Engineering The University of Texas at Arlington Box 19015 416 Yates St. Room 300, Nedderman Hall Arlington,
More informationIndexing SpatioTemporal Data Warehouses
Indexing SpatioTemporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, and Jun Zhang Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong
More informationOSTTree: An Access Method for Obfuscating SpatioTemporal Data in Location Based Services
OSTTree: An Access Method or Ouscating SpatioTemporal Data in Location Based Services Quoc Cuong TO, Tran Khanh DANG * Faculty o Computer Science & Engineering, HCMUT Ho Chi Minh City, Vietnam {qcuong,
More informationTimeSeries Prediction with Applications to Traffic and Moving Objects Databases
TimeSeries Prediction with Applications to Traffic and Moving Objects Databases Bo Xu Department of Computer Science University of Illinois at Chicago Chicago, IL 60607, USA boxu@cs.uic.edu Ouri Wolfson
More informationA New 128bit Key Stream Cipher LEX
A New 128it Key Stream Cipher LEX Alex Biryukov Katholieke Universiteit Leuven, Dept. ESAT/SCDCOSIC, Kasteelpark Arenerg 10, B 3001 Heverlee, Belgium http://www.esat.kuleuven.ac.e/~airyuko/ Astract.
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationEfficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
More informationA Dynamic Load Balancing Strategy for Parallel Datacube Computation
A Dynamic Load Balancing Strategy for Parallel Datacube Computation Seigo Muto Institute of Industrial Science, University of Tokyo 7221 Roppongi, Minatoku, Tokyo, 1068558 Japan +81334026231 ext.
More informationReducing multiclass to binary by coupling probability estimates
Reducing multiclass to inary y coupling proaility estimates Bianca Zadrozny Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 920930114 zadrozny@cs.ucsd.edu
More informationMultidimensional index structures Part I: motivation
Multidimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for
More informationLecture 1: Data Storage & Index
Lecture 1: Data Storage & Index R&G Chapter 811 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager
More informationIndex Selection Techniques in Data Warehouse Systems
Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES
More informationDATABASE DESIGN  1DL400
DATABASE DESIGN  1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information
More information2 Associating Facts with Time
TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains timevarying data. Time is an important aspect of all realworld phenomena. Events occur at specific points
More informationUsing Virtual SIP Links to Enable QoS for Signalling
Using Virtual SIP Links to Enale QoS for Signalling Alexander A. Kist and Richard J. Harris RMIT University Melourne BOX 2476V, Victoria 31, Australia Telephone: (+) 61 (3) 99255218, Fax: (+) 61 (3) 99253748
More informationMinimizing Probing Cost and Achieving Identifiability in Network Link Monitoring
Minimizing Proing Cost and Achieving Identifiaility in Network Link Monitoring Qiang Zheng and Guohong Cao Department of Computer Science and Engineering The Pennsylvania State University Email: {quz3,
More informationCOST ESTIMATION TECHNIQUES FOR DATABASE SYSTEMS
COST ESTIMATION TECHNIQUES FOR DATABASE SYSTEMS By Ashraf Aboulnaga A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY (COMPUTER SCIENCES) at the
More informationA Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.1120 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
More informationEFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES
ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com
More informationTHE concept of Big Data refers to systems conveying
EDIC RESEARCH PROPOSAL 1 High Dimensional Nearest Neighbors Techniques for Data Cleaning AncaElena Alexandrescu I&C, EPFL Abstract Organisations from all domains have been searching for increasingly more
More informationFile Management. Chapter 12
Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationMagnetometer Realignment: Theory and Implementation
Magnetometer Realignment: heory and Implementation William Premerlani, Octoer 16, 011 Prolem A magnetometer that is separately mounted from its IMU partner needs to e carefully aligned with the IMU in
More informationIntegrating QueryFeedback Based Statistics into Informix Dynamic Server
Integrating QueryFeedback Based Statistics into Informix Dynamic Server Alexander Behm 1, Volker Markl 2, Peter Haas 3, Keshava Murthy 4 1 Berufsakademie Stuttgart / IBM Germany alexbehm@gmx.de 2 IBM
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototypebased clustering Densitybased clustering Graphbased
More informationVector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flashbased Storage Arrays Version 1.0 Abstract
More informationRealtime Targeted Influence Maximization for Online Advertisements
Realtime Targeted Influence Maximization for Online Advertisements Yuchen Li Dongxiang Zhang ianlee Tan Department of Computer Science School of Computing, National University of Singapore {liyuchen,zhangdo,tankl}@comp.nus.edu.sg
More informationPerformance of KDBTrees with QueryBased Splitting*
Performance of KDBTrees with QueryBased Splitting* Yves Lépouchard Ratko Orlandic John L. Pfaltz Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Virginia Illinois
More informationSOLE: Scalable OnLine Execution of Continuous Queries on Spatiotemporal Data Streams
vldb manuscript No. (will be inserted by the editor) Mohamed F. Mokbel Walid G. Aref SOLE: Scalable OnLine Execution of Continuous Queries on Spatiotemporal Data Streams the date of receipt and acceptance
More informationENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771
ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced
More informationStatistics on Query Expressions in Relational Database Management Systems
Statistics on Query Expressions in Relational Database Management Systems Nicolas Bruno Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School
More informationUnit 4.3  Storage Structures 1. Storage Structures. Unit 4.3
Storage Structures Unit 4.3 Unit 4.3  Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CDROM
More informationJunghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 3731 Kuseongdong, Yuseonggu Daejoen, Korea
Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITIONBASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationChapter 12 File Management. Roadmap
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access
More informationChapter 12 File Management
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationOracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH03062 {sravada,jsharma}@us.oracle.com 1 Introduction
More informationAbstract. Key words: spacetime cube; GPS; spatialtemporal data model; spatialtemporal query; trajectory; cube cell. 1.
Design and Implementation of Spatialtemporal Data Model in Vehicle Monitor System Jingnong Weng, Weijing Wang, ke Fan, Jian Huang College of Software, Beihang University No. 37, Xueyuan Road, Haidian
More informationAPPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM
152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented
More informationFast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik vadim.stadnik@gmail.com Abstract This paper provides an introduction to the design of augmented data structures that offer
More informationLoad Distribution in Large Scale Network Monitoring Infrastructures
Load Distribution in Large Scale Network Monitoring Infrastructures Josep SanjuàsCuxart, Pere BarletRos, Gianluca Iannaccone, and Josep SoléPareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu
More informationINMEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: InMemory DBMS  SoSe 2015 1
INMEMORY DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: InMemory DBMS  SoSe 2015 1 Analytical Processing Today Separation of OLTP and OLAP Motivation Online Transaction Processing (OLTP)
More informationEfficient Data Streams Processing in the Real Time Data Warehouse
Efficient Data Streams Processing in the Real Time Data Warehouse Fiaz Majeed fiazcomsats@gmail.com Muhammad Sohaib Mahmood m_sohaib@yahoo.com Mujahid Iqbal mujahid_rana@yahoo.com Abstract Today many business
More informationAg + tree: an Index Structure for Rangeaggregation Queries in Data Warehouse Environments
Ag + tree: an Index Structure for Rangeaggregation Queries in Data Warehouse Environments Yaokai Feng a, Akifumi Makinouchi b a Faculty of Information Science and Electrical Engineering, Kyushu University,
More informationProbability, Mean and Median
Proaility, Mean and Median In the last section, we considered (proaility) density functions. We went on to discuss their relationship with cumulative distriution functions. The goal of this section is
More informationIndexing Techniques in Data Warehousing Environment The UBTree Algorithm
Indexing Techniques in Data Warehousing Environment The UBTree Algorithm Prepared by: Yacine ghanjaoui Supervised by: Dr. Hachim Haddouti March 24, 2003 Abstract The indexing techniques in multidimensional
More informationImage Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology EISSN 2277 4106, PISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
More informationOperating Systems: Internals and Design Principles. Chapter 12 File Management Seventh Edition By William Stallings
Operating Systems: Internals and Design Principles Chapter 12 File Management Seventh Edition By William Stallings Operating Systems: Internals and Design Principles If there is one singular characteristic
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationLocality Based Protocol for MultiWriter Replication systems
Locality Based Protocol for MultiWriter Replication systems Lei Gao Department of Computer Science The University of Texas at Austin lgao@cs.utexas.edu One of the challenging problems in building replication
More informationOn demand synchronization and load distribution for database gridbased Web applications
Data & Knowledge Engineering 51 (24) 295 323 www.elsevier.com/locate/datak On demand synchronization and load distribution for database gridbased Web applications WenSyan Li *,1, Kemal Altintas, Murat
More informationBig Data and Scripting. Part 4: Memory Hierarchies
1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)
More informationChapter 13. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing
More informationPerformance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.
Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application
More informationComp 5311 Database Management Systems. 16. Review 2 (Physical Level)
Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster
More informationSQL Query Evaluation. Winter 20062007 Lecture 23
SQL Query Evaluation Winter 20062007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
More informationStorage Management for High Energy Physics Applications
Abstract Storage Management for High Energy Physics Applications A. Shoshani, L. M. Bernardo, H. Nordberg, D. Rotem, and A. Sim National Energy Research Scientific Computing Division Lawrence Berkeley
More informationDATA STRUCTURES USING C
DATA STRUCTURES USING C QUESTION BANK UNIT I 1. Define data. 2. Define Entity. 3. Define information. 4. Define Array. 5. Define data structure. 6. Give any two applications of data structures. 7. Give
More informationCaching XML Data on Mobile Web Clients
Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D33102 Paderborn,
More informationAn error recovery transmission mechanism for mobile multimedia
An error recovery transmission mechanism for moile multimedia Min Chen *, Ang i School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China Astract This paper presents a channel
More informationPrinciples of Scatter Search
Principles of Scatter Search RAFAEL MARTÍ Dpto. de Estadística e Investigación Operativa, Facultad de Matemáticas, Universidad de Valencia, Dr. Moliner 50, 4600 Burjassot (Valencia) Spain. Rafael.Marti@uv.es
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT Inmemory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationQoS Provisioning in Mobile Internet Environment
QoS Provisioning in Moile Internet Environment Salem Lepaja (salem.lepaja@tuwien.ac.at), Reinhard Fleck, Nguyen Nam Hoang Vienna University of Technology, Institute of Communication Networks, Favoritenstrasse
More informationPartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2
More informationA Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling JingChiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University
More informationEfficient Data Structures for Decision Diagrams
Artificial Intelligence Laboratory Efficient Data Structures for Decision Diagrams Master Thesis Nacereddine Ouaret Professor: Supervisors: Boi Faltings Thomas Léauté Radoslaw Szymanek Contents Introduction...
More informationManaging Multiple Tasks in Complex, Dynamic Environments
In Proceedings of the 1998 National Conference on Artificial Intelligence. Madison, Wisconsin, 1998. Managing Multiple Tasks in Complex, Dynamic Environments Michael Freed NASA Ames Research Center mfreed@mail.arc.nasa.gov
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in highdimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to highdimensional data.
More information.:!II PACKARD. Performance Evaluation ofa Distributed Application Performance Monitor
r~3 HEWLETT.:!II PACKARD Performance Evaluation ofa Distributed Application Performance Monitor Richard J. Friedrich, Jerome A. Rolia* Broadband Information Systems Laboratory HPL95137 December, 1995
More informationMining Stream, TimeSeries, and Sequence Data
8 Mining Stream, TimeSeries, and Sequence Data Our previous chapters introduced the basic concepts and techniques of data mining. The techniques studied, however, were for simple and structured data sets,
More informationAnalysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems
in: Proc. 19th Int. Conf. on Very Large Database Systems, Dublin, August 1993, pp. 182193 Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems Erhard Rahm Robert
More informationOverview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 How indexlearning turns no student pale Yet holds the eel of science by the tail.  Alexander Pope (16881744) Database Management Systems 3ed, R. Ramakrishnan
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationBig Data & Scripting storage networks and distributed file systems
Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node
More informationDPTree: A Balanced Tree Based Indexing Framework for PeertoPeer Systems
DPTree: A Balanced Tree Based Indexing Framework for PeertoPeer Systems Mei Li WangChien Lee Anand Sivasubramaniam Department of Computer Science and Engineering Pennsylvania State University University
More informationEfficient OLAP Operations in Spatial Data Warehouses
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay,
More informationCHAPTER  5 CONCLUSIONS / IMP. FINDINGS
CHAPTER  5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationlowlevel storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: lowlevel storage structures e.g. partitions underpinning the warehouse logical table structures lowlevel structures
More informationSpatial Data Management over Flash Memory
Spatial Data Management over Flash Memory Ioannis Koltsidas 1 and Stratis D. Viglas 2 1 IBM Research, Zurich, Switzerland iko@zurich.ibm.com 2 School of Informatics, University of Edinburgh, UK sviglas@inf.ed.ac.uk
More informationOnline Amnesic Summarization of Streaming Locations
Online Amnesic Summarization of Streaming Locations Michalis Potamias 1, Kostas Patroumpas 2, and Timos Sellis 2 1 Computer Science Department, Boston University, MA, USA 2 School of Electrical and Computer
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database  appropriate level for users to focus on  user independence from implementation details Performance  other major factor
More informationEffective Complex Data Retrieval Mechanism for Mobile Applications
, 2325 October, 2013, San Francisco, USA Effective Complex Data Retrieval Mechanism for Mobile Applications Haeng Kon Kim Abstract While mobile devices own limited storages and low computational resources,
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationSeaCloudDM: Massive Heterogeneous Sensor Data Management in the Internet of Things
SeaCloudDM: Massive Heterogeneous Sensor Data Management in the Internet of Things Jiajie Xu Institute of Software, Chinese Academy of Sciences (ISCAS) 20120515 Outline 1. Challenges in IoT Data Management
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationKeywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.
Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement
More information