Efficient Use of Signatures in Object-Oriented Database Systems

Size: px
Start display at page:

Download "Efficient Use of Signatures in Object-Oriented Database Systems"

Transcription

1 Efficient Use of Signatures in Object-Oriented Database Systems Kjetil Nørvåg Department of Computer and Information Science Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. Signatures are bit strings, generated by applying some hash function on some or all of the attributes of an object. The signatures of the objects can be stored separately from the objects themselves, and can later be used to filter out candidate objects during a perfect match query. In an object-oriented database system (OODB) using logical object identifiers (OIDs), an object identifier index (OIDX) is needed to map from logical OID to the physical location of the object. In this paper we show how signatures can be stored in the OIDX, and used to reduce the average object access cost in a system. We also extend this approach to transaction time temporal OODBs (TOODB), where this approach is even more beneficial, because maintaining signatures comes virtually for free. We develop a cost model that we use to analyze the performance of the proposed approaches, and this analysis shows that substantial gain can be achieved. 1 Introduction A signature is a bit string, which is generated by applying some hash function on some or all of the attributes of an object. 1 When searching for objects that match a particular value, it is possible to decide from the signature of an object whether the object is a possible match. The size of the signatures is generally much smaller than the size of the objects themselves, and they are normally stored separately from the objects themselves, in signature files. By first checking the signatures when doing a perfect match query, the number of objects to actually be retrieved can be reduced. Signature files have previously been shown to be an alternative to indexing, especially in the context of text retrieval [1, 6]. Signature files can also be used in general query processing, although this is still an immature research area. The main drawback of signature files, is that signature file maintenance can be relatively costly. If one of the attributes contributing to the signature in an object is modified, the signature file has to be updated as well. To be beneficial, a high read to write ratio is necessary. In addition, high selectivity is needed at query time to make it beneficial to read the signature file in addition to the objects themselves. 1 Note that signatures are also often used in other contexts, e.g., function signatures and implementation signatures.

2 In this paper, we first show how signatures can be incorporated into traditional object-oriented databases (OODB). Second, we show how they can be used in transaction time temporal OODBs, with only marginal maintenance cost. Every object in an OODB is uniquely identified by an object identifier (OID). To do the mapping from logical OID to physical location, an OID index (OIDX), often a B- tree variant, is used. 2 The entries in the OIDX, which we call object descriptors (OD), contains the physical address of the object. Because of the way OIDs are generated, OIDX accesses often have low locality, i.e., often only one OD in a particular OIDX leaf node is accessed at a time. This means that OIDX lookups can be costly, but they have to be done every time an object is to be accessed (as will be explained later, the lookup cost can be reduced by employing OD caching). OIDX updates are only needed when object are created, moved, or deleted. It is not necessary when objects are updated, because updates to object are done in-place, so that the mapping information in the OIDX is still valid after an object have been updated. Our approach to reduce the average access cost in the system, is to include the signature of an object in the OIDX itself. This means that the OD now also includes the signature, in addition to the mapping information. When we later do a value based perfect match search on a set, we can in many cases avoid retrieving the objects themselves, checking the signature in the OD is enough to exclude an object during the search. The OD will have to be retrieved anyway, because it is needed to find the physical location of the object, so there is no additional cost to retrieve the OD, compared to not using signatures. Storing the signature in the OIDX increases the size of the OD, and the size of the OIDX, and makes an OIDX update necessary every time an object is updated, but as we will show later in this paper, in spite of this extra cost, it will in most cases be beneficial. A context where storing signatures in the OIDX is even more interesting, is transaction time temporal OODBs (TOODB). In a TOODB, object updates do not make previous versions unaccessible. On the contrary, previous versions of objects can still be accessed and queried. A system maintained timestamp is associated with every object version. This timestamp is the commit time of the transaction that created this version of the object. In a non-temporal OODB, the OIDX update would not be necessary if we did not want to maintain signatures. In a TOODB, on the other hand, the OIDX must be updated every time an object is updated, because we add a new version, and the timestamp and the physical address of the new version need to be inserted into the index. As a result, introducing signatures only marginally increases the OIDX update costs. Because of the low locality on updates, disk seek time dominates, and the increased size of the ODs is of less importance. With this approach, we can maintain signatures at a very low cost, and by using signatures, one of the potential bottlenecks in an TOODB, the frequent and costly OIDX updates, can be turned into an advantage! The organization of the rest of the paper is as follows. In Sect. 2 we give an overview of related work. In Sect. 3 we give a brief introduction to signatures. In Sect. 4 we describe indexing and object management in a TOODB. In Sect. 5 we describe how 2 Some OODBs avoid the OIDX by using physical OIDs. In that case, the OID gives the physical disk page directly. While this potentially gives a higher performance, it is very inflexible, and makes tasks as reclustering and schema management more difficult.

3 signatures can be integrated into OID indexing. In Sect. 6 we develop a cost model, which we use in Sect. 7 to study the performance when using signatures stored in the OIDX, with different parameters and access patterns. Finally, in Sect. 8, we conclude the paper and outline issues for further research. 2 Related Work Several studies have been done in using signatures as a text access methods, e.g. [1, 6]. Less has been done in using signatures in ordinary query processing, but studies have been done on using signatures in queries on set-valued objects [7]. We do not know of any OODB where signatures have been integrated, but we plan to integrate the approaches described in this paper in the Vagabond parallel TOODB [9]. 3 Signatures In this section we describe signatures, how they can be used to improve query performance, how they are generated, and signature storage alternatives. A signature is generated by applying some hash function on the object, or some of the attributes of the object. By applying this hash function, we get a signature of bits, bits set to 1. If we denote the attributes of an object as, the signature of the object is "!$#, where % is a hash value generating function, and "! are some or all of the attributes of the object (not necessarily including all of &! ). The size of the signature is usually much smaller than the object itself. with 3.1 Using Signatures A typical example of the use of signatures, is a query to find all objects in a set where the attributes match a certain number of values, &')(!*+(!. This can be done by calculating the query signature -, of the query: -,*. /(!1(! # The query signature -, is then compared to all the signatures in the signature file to find possible matching objects. A possible matching object, a drop, is an object that satisfies the condition that all bit positions set to 1 in the query signature, also are set to 1 in the object s signature. The drops forms a set of candidate objects. An object can have a matching signature even if it does not match the values searched for, so all candidate objects have to be retrieved and matched against the value set that is searched for. The candidate objects that do not match are called false drops. 3.2 Signature Generation The methods used for generating the signature depend on the intended use of the signature. We will now discuss some relevant methods. Whole Object Signature. In this case, we generate a hash value from the whole object. This value can later be used in a perfect match search that includes all attributes of the object.

4 One/Multi Attribute Signatures. The first method, whole object signature, is only useful for a limited set of queries. A more useful method is to create the hash value of only one attribute. This can be used for perfect match search on that specific attribute. Often, a search is on perfect match of a subset of the attributes. If such searches are expected to be frequent, it is possible to generate the signature from these attributes, again just looking at the subset of attributes as a sequence of bits. This method can be used as a filtering technique in more complex queries, where the results from this filtering can be applied to the rest of the query predicate. Superimposed Coding Methods. The previous methods are not very flexible, they can only be used for queries on the set of attributes used to generate the signature. To be able to support several query types, that do perfect match on different sets of attributes, a technique called superimposed coding can be used. In this case, a separate attribute signature for each attribute is created. The object signature is created by performing a bitwise OR on each attribute signature, for an object with 3 attributes the signature is # OR 3 # OR 3 #. This results in a signature that can be very flexible in its use, and support several types of queries, with different attributes. Superimposed coding is also used for fast text access, one of the most common applications of signatures. In this case, the signature is used to avoid full text scanning of each document, for example in a search for certain words occurring in a particular document. There can be more than one signature for each document. The document is first divided into logical blocks, which are pieces of text that contain a constant number of distinct words. A word signature is created for each word in the block, and the block signature is created by OR-ing the word signatures. 3.3 Signature Storage Traditionally, the signatures have been stored in one or more separate files, outside the indexes and objects themselves. The files contains for all objects in the relevant set. The sizes of these files are in general much smaller than the size of the relation/set of objects that the signatures are generated from, and a scan of the signature files is much less costly than a scan of the whole relation/set of objects. Two well-know storage structures for signatures are Sequential Signature Files (SSF) and Bit-Sliced Signature Files (BSSF). In the simplest signature file organization, SSF, the signatures are stored sequentially in a file. A separate pointer file is used to provide the mapping between signatures and objects. In an OODB, this file will typically be a file with OIDs, one for each signature. During each search for perfect match, the whole signature file has to be read. Updates can be done by updating only one entry in the file. With BSSF, each bit of the signature is stored in a separate file. With a signature size, the signatures are distributed over files, instead of one file as in the SSF approach. This is especially useful if we have large signatures. In this case, we only have to search the files corresponding to the bit fields where the query signature has a 1. This can reduce the search time considerably. However, each update implies updating up to files, which is very expensive. So, even if retrieval cost has been shown to be much smaller for BSSF, the update cost is much higher, 1-1 times higher is not

5 uncommon [7]. Thus, BSSF based approaches are most appropriate for relatively static data. 4 Object and Index Management in TOODB We start with a description of how OID indexing and version management can be done in a TOODB. This brief outline is not based on any existing system, but the design is close enough to make it possible to integrate into current OODBs if desired. 4.1 Temporal OID Indexing In a traditional OODB, the OIDX is usually realized as a hash file or a B5 -tree, with ODs as entries, and using the OID as the key. In a TOODB, we have more than one version of some of the objects, and we need to be able to access current as well as old versions efficiently. Our approach to indexing is to have one index structure, containing all ODs, current as well as previous versions. In this paper, we assume one OD for each object version, stored in a B5 -tree. We include the commit time TIME in the OD, and use the concatenation of OID and time, OID6 TIME, as the index key. By doing this, ODs for a particular OID will be clustered together in the leaf nodes, sorted on commit time. As a result, search for the current version of a particular OID as well as retrieval of a particular time interval for an OID can be done efficiently. When a new object is created, i.e., a new OID allocated, its OD is appended to the index tree as is done in the case of the Monotonic B5 -tree [5]. This operation is very efficient. However, when an object is updated, the OD for the new version has to be inserted into the tree. While several efficient multiversion access methods exist, e.g., TSB-tree [8], they are not suitable for our purpose, because they provide more flexibility than needed, e.g., combined key range and time range search, at an increased cost. We will never have search for a (consecutive) range of OIDs, OID search will always be for perfect match, and most of them are assumed to be to the current version. It should be noted that our OIDX is inefficient for many typical temporal queries. As a result, additional secondary indexes can be needed, of which TSB-tree is a good candidate. However, the OIDX is still needed, to support navigational queries, one of the main features of OODBs compared to relational database systems. 4.2 Temporal Object Management In a TOODB, it is usually assumed that most accesses will be to the current versions of the objects in the database. To keep these accesses as efficient as possible, and to benefit from object clustering, the database is partitioned. Current objects reside in one partition, and the previous versions in the other partition, in the historical database. When an object is updated in a TOODB, the previous version is first moved to the historical database, before the new version is stored in-place in the current database. We assume that clustering is not maintained for historical data, so that all objects going historical, i.e., being moved because they are replaced by a new version, can be

6 written sequentially, something which reduces update costs considerably. The OIDX is updated every time an object is updated. Not all the data in a TOODB is temporal, for some of the objects, we are only interested in the current version. To improve efficiency, the system can be made aware of this. In this way, some of the data can be defined as non-temporal. Old versions of these are not kept, and objects can be updated in-place as in a traditional OODB, and the costly OIDX update is not needed when the object is modified. This is an important point: using an OODB which efficiently supports temporal data management, should not reduce the performance of applications that do not utilize these features. 5 Storing Signatures in the OIDX The signature can be stored together with the mapping information (and timestamp in the case of TOODBs) in the OD in the OIDX. Perfect match queries can use the signatures to reduce the number of objects that have to be retrieved, only the candidate objects, with matching signatures, need to be retrieved. Optimal signature size is very dependent of data and query types. In some cases, we can manage with a very small signature, in other cases, for example in the case of text documents, we want a much larger signature size. It is therefore desirable to be able to use different signature sizes for different kind of data, and as a result, we should provide different signature sizes. The maintenance of object signatures implies computational overhead, and is not always required or desired. Whether to maintain signatures or not, can for example be decided on a per class basis. 6 Analytical Model Due to space constraints, we can only present a brief overview of the cost model used. For a more detailed description, we refer to [1]. The purpose of this paper is to show the benefits of using signatures in the OIDX, so we will restrict this analysis to attribute matches, using the superimposed coding technique. Our cost model focus on disk access costs, as this is the most significant cost factor. In our disk model, we distinguish between random and sequential accesses. With random access, the time to read or write a page is denoted 798, with sequential access, the time to read or write a page is 7%:. All our calculations are based on a page size of 8 KB. The system modeled in this paper, is a page server OODB, with temporal extensions as described in the previous sections. To reduce disk I/O, the most recently used index and object pages are kept in a page buffer of size ; pbuf. OIDX pages will in general have low locality, and to increase the probability of finding a certain OD needed for a mapping from OID to physical address, the most recently used ODs are kept in a separate OD cache of size ; ocache, containing < ocache ODs. The OODB has a total of ; bytes available for buffering. Thus, when we talk about the memory size ;, we only consider the part of main memory used for buffering, not main memory used for the

7 objects, the program itself, other programs, the operating system, etc. The main memory size ; is the sum of the size of the page buffer and the OD cache, ;=>; pbuf? ; ocache. With increasing amounts of main memory available, buffer characteristics are very important. To reflect this, our cost model includes buffer performance as well, in order to calculate the hit rates of the OD cache and the page buffer. Our buffer model is an extension of the Bhide, Dan and Dias LRU buffer model [2]. An important feature of the BDD model, which makes it more powerful than some other models, is that it can be used with non-uniform access distributions. The derivation of the BDD model in [2] also includes an equation to calculate the number <*@ of distinct objects out of a total of < access objects, given a particular access distribution. We denote this equation <@ BADCEFGHC 3<@ <I#. The buffer hit probability of an object page is denoted J buf opage. Note that even if index and object pages share the same page buffer, the buffer hit probability is different for index and object pages. We do not consider the cost of log operations, because the logging is done to separate disks, and the cost is independent of the other costs. To analyze the use of signatures in the OIDX, we need a cost model that includes: 1. OIDX update and lookup costs. 2. Object storage and retrieval costs. The OIDX lookup and update costs can be calculated with our previously published cost model [1]. The only modification done to this cost model is that signatures are stored in the object descriptors (ODs). As a consequence, the OD size varies with different signature sizes. In practice, a signature in an OD will be stored as a number of bytes, and for this reason, we only consider signature sizes that are multiples of 8 bits. The average OIDX lookup cost, i.e., the average time to retrieve the OD of an object, is denoted 7 oidx lookup, and the average time to do an update is 7 oidx update. Not all objects are temporal, and in our model, we denote the fraction of the operations done on temporal objects as J temporal. For the non-temporal objects, if signatures are not maintained, the OIDX is only updated when the objects are created. 6.1 Object Storage and Retrieval Cost Model One or more objects are stored on each disk page. To reduce the object retrieval cost, objects are often placed on disk pages in a way that makes it likely that more than one of the objects on a page that is read, will be needed in the near future. This is called clustering. In our model, we define the clustering factor K as the fraction of an object page that is relevant, i.e., if there are <L MNPOQ objects on each page, and R of them will be used, KS TVU WẌ Y[Z. If <L MNPOQ]\+^ _, i.e., the average object size is larger than one disk page, we define K)`^ _. Read Objects. We model the database read accesses as 1) ordinary object accesses, assumed to benefit from the database clustering, and 2) perfect match queries, which can benefit from signatures. We assume the perfect match queries to be a fraction J,ba of the read accesses, and that JVc is the fraction of queried objects that are actual drops. Assuming a clustering factor of K, the average object retrieval cost, excluding OIDX

8 lookup, is 7d e Lfg h TVU WẌ Y[Z 7 e MNPOQ, where the average cost of reading one page from the database, is 7 e MNPOQ `b^9jkj buf opage#b7 8. When reading object pages during signature based queries, we must assume we can not benefit from clustering, because we retrieve only a very small amount of the total number of objects. In that case, one page must be read for every object that is retrieved, 7d e L fe l7 e MNPOQ. The average object retrieval cost, employing signatures, is: 7 e LfE l7 oidx lookup? b^vjmj,ba #b7 ed L fe? J,ba 3JVcV7 ed L fe? e d LfE # which means that of the Jo,ba that are queries for perfect match, we only need to read the object page in the case of actual or false drops. The false drop probability when a signature with bits is generated from p attributes is ` # a rts u where q v. Update Objects. Updating can be done in-place, with write-ahead logging (but note that in the case of temporal objects, these are moved to the historical partition before the new current version is written). In that case, a transaction can commit after its log records have been written to disk. Modified pages are not written back immediately, this is done lazily in the background as a part of the buffer replacement and checkpointing. Thus, a page may be modified several times before it is written back. Update costs will be dependent of the checkpoint interval. The checkpoint interval is defined to be the number of objects that can be written between two checkpoints. The number of written objects, < h 8, includes created as well as updated objects. < hw J Qbx < h 8 of the written objects are creations of new objects, and 3< h 8 jy< hnw # of the written objects are updates of existing objects. The number of distinct updated objects during one checkpoint period is < v{z BADCEFGHC3< h 8 j}< hw <Lfg#. The average number of times each object is updated is < z TV~ % To~ ƒ To. During one checkpoint interval, the number of pages in the current TV partition of the database that is affected is < 8} TVU WẌ Y[Z h This means that during one checkpoint interval, new versions must be inserted into <8 pages. K]<*L MNPOQ objects on each page have been updated, and each of them have been updated an average of < z times. For each of these pages, we need to write J temporal< z K]< L MNPOQ objects to the historical partition (this includes objects from the page and objects that were not installed into the page before they went historical), install the new current version to the page, and write it back. This will be done in batch, to reduce disk arm movement, and benefits from sequential writing of the historical objects. For each of the object updates, the OIDX must be updated as well. In the case of a non-temporal OODB, we do not need to write previous versions to the historical partition, and the OIDX needs only to be updated if signatures are to be maintained. When new objects are created, an index update is needed. When creating a new object, a new page will, on average, be allocated for every <*L MNPOQ object creation. When a new page is allocated, installation read is not needed. The average object update cost, excluding OIDX update cost: 79x e ˆC3Q LfE 7 oidx update? < 8 7%:%J temporal< z K]< L MNPOQ #? < 8 7 8? < h 8 T ~ ƒ T U WXiY[Z 7 8

9 7 U Ž Note that objects larger than one disk page will usually be partitioned, and each object is indexed by a separate large object index tree. This has the advantage that when a new version is created, only the modified parts need to be written back. An example of how this can be done is the EXODUS large objects [3]. 7 Performance We have now derived the cost functions necessary to calculate the average object storage and retrieval costs, with different system parameters and access patterns, and with and without the use of signatures. We will in this section study how different values of these parameters affect the access costs. Optimal parameter values are dependent of the mix of updates and lookups, and they should be studied together. If we denote the probability that an operation is a write, as J write, the average access cost is the weighted cost of average object read and write operations: 7 access Sb^&jŠJ write#[7 lookup? J write7 update UEW U3Ẅ Our goal in this study is to minimize 7 access. We measure the gain from the optimization access as: Gain `^ n ŒF Ž U3Ẅ access where 7 is the cost if not using signatures, and LDMC access access is the cost using signatures. L L[M C access Access Model The access pattern affects storage and retrieval costs directly and indirectly, through the buffer hit probabilities. The access pattern is one of the parameters in our model, and is modeled through the independent reference model. Accesses to objects in the database system are assumed to be random, but skewed (some objects are more often accessed than others), and the objects in the database can be logically partitioned into a number of partitions, where the size and access probability of each partition can be different. With denoting the relative size of a partition, and n denoting the percentage of accesses going to that partition, we can summarize the four access patterns used in this paper: 4 4 Set 9 o n 3P P P P In the first partitioning set, we have three partitions. It is an extensions of the 8/2 model, but with the 2% hot spot partition further divided into a 1% hot spot area and a 19% less hot area. The second partitioning set, 3P2 resembles the access pattern close to what we expect it to be in future TOODBs. The two other sets in this analysis have each two partitions, with hot spot areas of 5% and 2%.

10 ² ² Parameter Value Parameter Value Parameter Value 1 MB œ Fš ocache.2 objver 1 mill. 1 œ"žÿ œ ocache od - ª «D.2 ±.1 write.2 qm.4 temporal.8 Table 1. Default parameters P82 2P955 3P1 3P P82 2P955 3P1 3P2 Cost in seconds Cost in seconds Signature size F Signature size F Fig. 1. Cost versus signature size for different access patterns. Non-temporal OODB to the left, temporal OODB to the right. Parameters We consider a database in a stable condition, with a total of < objver objects versions (and hence, < objver ODs in the OIDX). Note that with the OIDX described in Section 4.1, OIDX performance is not dependent of the number of existing versions of an object, only the total number of versions in the database. Unless otherwise noted, results and numbers in the next sections are based on calculations using default parameters, as summarized in Table 1, and access pattern according to partitioning set 3P1. With the default parameters, the studied database has a size of 13 GB. The OIDX has a size of 3 GB in the case of a non-temporal OODB, and 5 GB in the case of a temporal OODB (not counting the extra storage needed to store the signatures). Note that the OIDX size is smaller in a non-temporal OODB, because in a non-temporal OODB, we do not have to store timestamps, and we have no inserts into the index tree, only append-only. In that case, we can get a very good space utilization [4]. When we have inserts into the OIDX, as in the case of a temporal OODB, we get a space utilization in the OIDX that is less than _ ³. 7.1 Optimal Signature Size Choosing the signature size is a tradeoff. A larger signature can reduce the read costs, but will also increase the OIDX size and OIDX access costs. Figure 1 illustrates this for different access patterns. In this case, a value of µ seems to be optimal. This

11 ² ² Cost in seconds D=1 D=2 D=4 D=8 Cost in seconds D=1 D=2 D=4 D= Signature size F Signature size F Fig. 2. Cost versus signature size for different values of. Non-temporal OODB to the left, temporal OODB to the right P82 2P955 3P1 3P P82 2P955 3P1 3P Fig. 3. Gain from using signatures versus memory size, for different access patterns. Nontemporal OODB to the left, temporal OODB to the right. is quite small, and gives a higher false drop probability than accepted in text retrieval applications. The reason why such a small signature is optimal in our context, is that the size of objects is small enough to make object retrieval and match less costly than a document (large object) retrieval and subsequent search for matching word(s), as is the case in text retrieval applications. The signature size is dependent of p, the number of attributes contributing to the signature. This is illustrated in Fig. 2. With an increasing value of p, the optimal signature size increases as well. In our context, a value of p larger than one, means that more than one attribute contributes to the signature, so that queries on more than one attribute can be performed later. In the rest of this study, we will use =8 when using the default parameters, and use =16, 32 and 48 for p =2, 4, and 8, respectively.

12 7 6 5 D=1, F=8 D=2, F=16 D=4, F=32 D=8, F= D=1, F=8 D=2, F=16 D=4, F=32 D=8, F= Fig. 4. Gain from using signatures, versus memory size, for different values of. Non-temporal OODB to the left, temporal OODB to the right B 64 B 128 B 256 B 512 B B 64 B 128 B 256 B 512 B Fig. 5. Gain from using signatures, versus memory size, for different average object sizes. Nontemporal OODB to the left, temporal OODB to the right. 7.2 Gain From Using Signatures Figure 3 shows the gain from using signatures, with different access patterns. Using signatures is beneficial for all access patterns, except when only a very limited amount of memory is available. Figure 4 shows the gain from using signatures, for different values of p. The gain decreases with increasing value of p. 7.3 The Effect of the Average Object Size We have chosen 128 as the default average object size. It might be objected that this value is too large, but Fig. 5 shows that even with smaller object sizes, using signatures will be beneficial. The figure also shows how the gain increases with increasing object size. 7.4 The Effect of and ¹ The value of J,ba is the fraction of the read queries that can benefit from signatures. Figure 6 illustrates the gain with different values of J,ia. As can be expected, a small

13 Fig. 6. Gain from using signatures, versus memory size, for different values of ºb». Nontemporal OODB to the left, temporal OODB to the right Fig. 7. Gain from using signatures, versus memory size, for different values of ±. Non-temporal OODB to the left, temporal OODB to the right. value of J,ia results in negative gain in the case of non-temporal OODBs, i.e., in this case, storing and maintaining signatures in the OIDX reduces the average performance. The value of Jnc is the fraction of queries objects that are actual drops, the selectivity of the query. Only if the value of JVc is sufficiently low, will we be able to benefit from using signatures. Figure 7 shows that signatures will be beneficial even with a relatively large value of JVc. 8 Conclusions We have in this paper described how signatures can be stored in the OIDX. As the OD is accessed on every object access in any case, there is no extra signature retrieval cost. In a traditional, non-versioned OODBs, maintaining signatures means that the OIDX needs to be updated every time an object is updated, but as the analysis shows, it will in most cases pay back, as less objects need to be retrieved. Storing signatures in the OIDX is even more attractive for TOODBs. In TOODBs, the OIDX will have to be updated on every object update anyway, so in that case, the extra cost associated with signature maintenance is very low.

14 As showed in the analysis, substantial gain can be achieved by storing the signature in the OIDX. We have done the analysis with different system parameters, access patterns, and query patterns, and in most cases, storing the object signatures in the OIDX is beneficial. The typical gain is from 2 to 4%. Interesting to note is that the optimal signature size can be quite small. The example analyzed in this paper is quite simple. A query for perfect match has a low complexity, and there is only limited room for improvement. The real benefit is available in queries where the signatures can be used to reduce the amount of data to be processed at subsequent stages of the query, resulting in larger amounts of data that can be processed in main memory. This can speed up query processing several orders of magnitude. Another interesting topic for further research is using signatures in the context of bitemporal TOODBs. Acknowledgments The author would like to thank Olav Sandstå for proofreading, and Kjell Bratbergsengen, who first introduced him to signatures. References 1. E. Bertino and F. Marinaro. An evaluation of text access methods. In Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences, Vol.II: Software Track, A. K. Bhide, A. Dan, and D. M. Dias. A simple analysis of the LRU buffer policy and its relationship to buffer warm-up transient. In Proceedings of the Ninth International Conference on Data Engineering, M. Carey, D. DeWitt, J. Richardson, and E. Shekita. Object and file management in the EXODUS extensible database system. In Proceedings of the 12th VLDB Conference, Kyoto, Japan, August 1986, A. Eickler, C. A. Gerlhof, and D. Kossmann. Performance evaluation of OID mapping techniques. In Proceedings of the 21st VLDB Conference, R. Elmasri, G. T. J. Wuu, and V. Kouramajian. The time index and the monotonic B¼ - tree. In A. U. Tansel, J. Clifford, S. K. Gadia, S. Jajodia, A. Segev, and R. Snodgrass, editors, Temporal databases: theory, design and implementation. The Benjamin/Cummings Publishing Company, Inc., C. Faloutsos and R. Chan. Fast text access methods for optical and large magnetic disks: Designs and performance comparison. In Proceedings of the 14th VLDB Conference, Y. Ishikawa, H. Kitagawa, and N. Ohbo. Evaluation of signature files as set access facilities in OODBs. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, D. Lomet and B. Salzberg. Access methods for multiversion data. In Proceedings of the 1989 ACM SIGMOD, K. Nørvåg and K. Bratbergsengen. Log-only temporal object storage. In Proceedings of the 8th International Workshop on Database and Expert Systems Applications, DEXA 97, K. Nørvåg and K. Bratbergsengen. Optimizing OID indexing cost in temporal objectoriented database systems. In Proceedings of the 5th International Conference on Foundations of Data Organization, FODO 98, 1998.

An Approach to High-Performance Scalable Temporal Object Storage

An Approach to High-Performance Scalable Temporal Object Storage An Approach to High-Performance Scalable Temporal Object Storage Kjetil Nørvåg Department of Computer and Information Science Norwegian University of Science and Technology 791 Trondheim, Norway email:

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

2 Associating Facts with Time

2 Associating Facts with Time TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Storage in Database Systems. CMPSCI 445 Fall 2010

Storage in Database Systems. CMPSCI 445 Fall 2010 Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query

More information

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In

More information

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7 Storing : Disks and Files Chapter 7 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet base Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Disks and

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

Operating Systems. Virtual Memory

Operating Systems. Virtual Memory Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1 Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible

More information

Physical Database Design and Tuning

Physical Database Design and Tuning Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

The LHAM Log-Structured History Data Access Method

The LHAM Log-Structured History Data Access Method The LHAM Log-Structured History Data Access Method Peter Muth *, Patrick O Neil **, Achim Pick *, Gerhard Weikum * * Department of Computer Science University of the Saarland D-66041 Saarbruecken, Germany

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files

More information

1. Physical Database Design in Relational Databases (1)

1. Physical Database Design in Relational Databases (1) Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860 Java DB Performance Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860 AGENDA > Java DB introduction > Configuring Java DB for performance > Programming tips > Understanding Java DB performance

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Storage Management for Files of Dynamic Records

Storage Management for Files of Dynamic Records Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

The Design, Implementation, and Performance of the V2 Temporal Document Database System

The Design, Implementation, and Performance of the V2 Temporal Document Database System The Design, Implementation, and Performance of the V2 Temporal Document Database System Kjetil Nørvåg 1 Department of Computer and Information Science Norwegian University of Science and Technology 7491

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

Redpaper. Performance Metrics in TotalStorage Productivity Center Performance Reports. Introduction. Mary Lovelace

Redpaper. Performance Metrics in TotalStorage Productivity Center Performance Reports. Introduction. Mary Lovelace Redpaper Mary Lovelace Performance Metrics in TotalStorage Productivity Center Performance Reports Introduction This Redpaper contains the TotalStorage Productivity Center performance metrics that are

More information

Recovery Principles in MySQL Cluster 5.1

Recovery Principles in MySQL Cluster 5.1 Recovery Principles in MySQL Cluster 5.1 Mikael Ronström Senior Software Architect MySQL AB 1 Outline of Talk Introduction of MySQL Cluster in version 4.1 and 5.0 Discussion of requirements for MySQL Cluster

More information

Lecture 1: Data Storage & Index

Lecture 1: Data Storage & Index Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager

More information

COS 318: Operating Systems

COS 318: Operating Systems COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache

More information

Merkle Hash Trees for Distributed Audit Logs

Merkle Hash Trees for Distributed Audit Logs Merkle Hash Trees for Distributed Audit Logs Subject proposed by Karthikeyan Bhargavan Karthikeyan.Bhargavan@inria.fr April 7, 2015 Modern distributed systems spread their databases across a large number

More information

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL

More information

Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse

Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse Zainab Qays Abdulhadi* * Ministry of Higher Education & Scientific Research Baghdad, Iraq Zhang Zuping Hamed Ibrahim Housien**

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

FAWN - a Fast Array of Wimpy Nodes

FAWN - a Fast Array of Wimpy Nodes University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed

More information

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering DELL RAID PRIMER DELL PERC RAID CONTROLLERS Joe H. Trickey III Dell Storage RAID Product Marketing John Seward Dell Storage RAID Engineering http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/top

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

Storage and File Structure

Storage and File Structure Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Experiment 5.1 How to measure performance of database applications?

Experiment 5.1 How to measure performance of database applications? .1 CSCI315 Database Design and Implementation Experiment 5.1 How to measure performance of database applications? Experimented and described by Dr. Janusz R. Getta School of Computer Science and Software

More information

Information Retrieval Systems Class Notes (May 7, 2014)

Information Retrieval Systems Class Notes (May 7, 2014) Information Retrieval Systems Class Notes () Prepared by: Hamed Rezanejad Signature file: All the signatures that represent the documents in the collection are kept in a file called signature file. Each

More information

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems 1 Any modern computer system will incorporate (at least) two levels of storage: primary storage: typical capacity cost per MB $3. typical access time burst transfer rate?? secondary storage: typical capacity

More information

gloser og kommentarer til Macbeth ( sidetal henviser til "Illustrated Shakespeare") GS 96

gloser og kommentarer til Macbeth ( sidetal henviser til Illustrated Shakespeare) GS 96 side 1! " # $ % &! " '! ( $ ) *! " & +! '!, $ #! "! - % " "! &. / 0 1 & +! '! ' & ( ' 2 ) % $ 2 3 % # 2! &. 2! ' 2! '! $ $! " 2! $ ( ' 4! " 3 & % / ( / ( ' 5! * ( 2 & )! 2 5! 2 &! # '! " & +! '! ' & &!

More information

Monitoring PostgreSQL database with Verax NMS

Monitoring PostgreSQL database with Verax NMS Monitoring PostgreSQL database with Verax NMS Table of contents Abstract... 3 1. Adding PostgreSQL database to device inventory... 4 2. Adding sensors for PostgreSQL database... 7 3. Adding performance

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles Chapter 12 File Management Eighth Edition By William Stallings Files Data collections created by users The File System is one of the most important parts

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

Improving Transaction-Time DBMS Performance and Functionality

Improving Transaction-Time DBMS Performance and Functionality Improving Transaction-Time DBMS Performance and Functionality David B. Lomet #, Feifei Li * # Microsoft Research Redmond, WA 98052, USA lomet@microsoft.com * Department of Computer Science Florida State

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013 Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi tqchi@cse.hcmut.edu.vn Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records

More information

CS 153 Design of Operating Systems Spring 2015

CS 153 Design of Operating Systems Spring 2015 CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations Physical Disk Structure Disk components Platters Surfaces Tracks Arm Track Sector Surface Sectors Cylinders Arm Heads

More information

PETASCALE DATA STORAGE INSTITUTE. SciDAC @ Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI

PETASCALE DATA STORAGE INSTITUTE. SciDAC @ Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI PETASCALE DATA STORAGE INSTITUTE 3 universities, 5 labs, G. Gibson, CMU, PI SciDAC @ Petascale storage issues www.pdsi-scidac.org Community building: ie. PDSW-SC07 (Sun 11th) APIs & standards: ie., Parallel

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

6. Storage and File Structures

6. Storage and File Structures ECS-165A WQ 11 110 6. Storage and File Structures Goals Understand the basic concepts underlying different storage media, buffer management, files structures, and organization of records in files. Contents

More information

File Systems Management and Examples

File Systems Management and Examples File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size

More information

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2 Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...

More information

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface File Management Two Parts Filesystem Interface Interface the user sees Organization of the files as seen by the user Operations defined on files Properties that can be read/modified Filesystem design Implementing

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

The Bw-Tree Key-Value Store and Its Applications to Server/Cloud Data Management in Production

The Bw-Tree Key-Value Store and Its Applications to Server/Cloud Data Management in Production The Bw-Tree Key-Value Store and Its Applications to Server/Cloud Data Management in Production Sudipta Sengupta Joint work with Justin Levandoski and David Lomet (Microsoft Research) And Microsoft Product

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

File System Management

File System Management Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation

More information

Physical DB design and tuning: outline

Physical DB design and tuning: outline Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning

More information

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition Chapter 11: File System Implementation 11.1 Silberschatz, Galvin and Gagne 2009 Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Data Storage - II: Efficient Usage & Errors

Data Storage - II: Efficient Usage & Errors Data Storage - II: Efficient Usage & Errors Week 10, Spring 2005 Updated by M. Naci Akkøk, 27.02.2004, 03.03.2005 based upon slides by Pål Halvorsen, 12.3.2002. Contains slides from: Hector Garcia-Molina

More information

Windows NT File System. Outline. Hardware Basics. Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik

Windows NT File System. Outline. Hardware Basics. Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik Windows Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik Outline NTFS File System Formats File System Driver Architecture Advanced Features NTFS Driver On-Disk Structure (MFT,...)

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length

More information

CHAPTER 17: File Management

CHAPTER 17: File Management CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

File-System Implementation

File-System Implementation File-System Implementation 11 CHAPTER In this chapter we discuss various methods for storing information on secondary storage. The basic issues are device directory, free space management, and space allocation

More information

DATABASE DESIGN - 1DL400

DATABASE DESIGN - 1DL400 DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Multimedia Caching Strategies for Heterogeneous Application and Server Environments

Multimedia Caching Strategies for Heterogeneous Application and Server Environments Multimedia Tools and Applications 4, 279 312 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Multimedia Caching Strategies for Heterogeneous Application and Server Environments

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Configuring Apache Derby for Performance and Durability Olav Sandstå

Configuring Apache Derby for Performance and Durability Olav Sandstå Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture

More information

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

Mass Storage Structure

Mass Storage Structure Mass Storage Structure 12 CHAPTER Practice Exercises 12.1 The accelerating seek described in Exercise 12.3 is typical of hard-disk drives. By contrast, floppy disks (and many hard disks manufactured before

More information

The World According to the OS. Operating System Support for Database Management. Today s talk. What we see. Banking DB Application

The World According to the OS. Operating System Support for Database Management. Today s talk. What we see. Banking DB Application The World According to the OS Operating System Support for Database Management App1 App2 App3 notes from Stonebraker s paper that appeared in Computing Practices, 1981 Operating System Anastassia Ailamaki

More information

My SQL in a Main Memory Database Context

My SQL in a Main Memory Database Context My SQL in a Main Memory Database Context Michael Lodberg Samuel and Anders Uhl Pedersen Technical Report no. 2004/02 ISSN: 0107-8283 DIKU University of Copenhagen Universitetsparken 1 DK-2100 Copenhagen

More information

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time Skip Lists CMSC 420 Linked Lists Benefits & Drawbacks Benefits: - Easy to insert & delete in O(1) time - Don t need to estimate total memory needed Drawbacks: - Hard to search in less than O(n) time (binary

More information

Practical Database Design and Tuning

Practical Database Design and Tuning Practical Database Design and Tuning 1. Physical Database Design in Relational Databases Factors that Influence Physical Database Design: A. Analyzing the database queries and transactions For each query,

More information

Query Performance Tuning: Start to Finish. Grant Fritchey

Query Performance Tuning: Start to Finish. Grant Fritchey Query Performance Tuning: Start to Finish Grant Fritchey Who? Product Evangelist for Red Gate Software Microsoft SQL Server MVP PASS Chapter President Author: SQL Server Execution Plans SQL Server 2008

More information

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni Agenda Database trends for the past 10 years Era of Big Data and Cloud Challenges and Options Upcoming database trends Q&A Scope

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

Outline. Windows NT File System. Hardware Basics. Win2K File System Formats. NTFS Cluster Sizes NTFS

Outline. Windows NT File System. Hardware Basics. Win2K File System Formats. NTFS Cluster Sizes NTFS Windows Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik 2 Hardware Basics Win2K File System Formats Sector: addressable block on storage medium usually 512 bytes (x86 disks) Cluster:

More information

High-performance XML Storage/Retrieval System

High-performance XML Storage/Retrieval System UDC 00.5:68.3 High-performance XML Storage/Retrieval System VYasuo Yamane VNobuyuki Igata VIsao Namba (Manuscript received August 8, 000) This paper describes a system that integrates full-text searching

More information