Physical Database Design for. Marc H. Scholl. Oberer Eselsberg, D-7900 Ulm, Germany.

Size: px
Start display at page:

Download "Physical Database Design for. Marc H. Scholl. Oberer Eselsberg, D-7900 Ulm, Germany. e-mail: scholl@informatik.uni-ulm.de"

Transcription

1 1 Physical Database Design for an Object-Oriented Database System Marc H. Scholl University of Ulm, Department of Computer Science Oberer Eselsberg, D-7900 Ulm, Germany In: J.-C. Freytag, G. Vossen, D.E. Maier (eds.), Query Processing for Advanced Database Applications, Morgan Kaufmann, 1992, to appear 1

2 Abstract Object-oriented database systems typically oer a variety of structuring capabilities to model complex objects. This exibility, together with type (or class) hierarchies and computed \attributes" (methods), poses a high demand on the physical design of object-oriented databases. Similar to traditional databases, it is hardly ever true that the conceptual structure of the database is also a good, that is, ecient, internal one. Rather, data representing the conceptual objects may be structured completely dierent, for performance reasons. Database systems providing a reasonable amount of data independence allow aphysical design that diers from the logical structure signicantly. Hence, the performance of the system can be tailored to the overall transaction load faced. The paper presents choices for physical designs that make use of a complex storage model, an extended nested relational model. A rst prototype of a physical design optimizer is also presented. 1.1 Introduction Object-oriented database management systems (OODBMSs) typically oer a variety of structuring capabilities to model complex objects: objects may be hierarchically composed of subobjects, several objects may share common subobjects, objects may appear as (attribute) values of other objects, dierent objects can be related to each other by functions, methods, or relationships. Type (or class) hierarchies introduce another dimension of object interrelation: an object of one class also \appears" in all its superclasses again, with multiple inheritance, this need not be a strict hierarchical inclusion. Computed values (attributes, methods) may be used to derive, rather than store, data that are associated with objects. Obviously, it is not at all trivial to nd good, that is, ecient, storage structures that support the variety of operations on objects reasonably well. Two of the standard approaches to implementing such object-oriented data models are to either (i) map everything to an underlying relational database system (RDBMS), or (ii) implementanadvanced storage server that oers more complex structures than at relations. The rst approach oers the advantage that one can build on established, matured technology and, because of standards, that this seems to be a portable solution: it is not necessarily tied to one particular RDBMS.

3 On the other hand, the typical disadvantages of such \front-end" solutions are that without being able to internally tune the RDBMS to that application, it is unlikely to obtain good performance, since the complex structures of the object-oriented database schema have tobebroken into small pieces in order to be stored in (at) relations. As a consequence, queries to the object-oriented schema have to be mapped into large joins queries against the relational database. Of course, one mightimprove on the state-of-the-art in commercial RDBMSs by including advanced access support, such as join indices, link elds, or materialized functions, in order to make the relational implementation more feasible. We discuss such extensions under the second approach in a more general context. The second approach has the obvious disadvantage that one has to implement a new storage manager with more powerful capabilities for complex structured data, which requires a major eort in terms of design and implementation. On the other hand, the potential benet of such anendeavor is superior performance due to a more exible physical database organization that allows for more ecient query processing algorithms. Over the last decade, there have been numerous attempts to come up with new DBMS architectures based on advanced storage managers (see [Carey et al., 1986 Batory, 1987 Haas et al., 1990 Haerder et al., 1987] for some examples). The DASDBS project 1 is one of these attempts [Paul et al., 1987 Schek et al., 1990], where the storage manager implements nested relations, that is, a hierarchical data structure where attribute values can either be atomic or embedded (sub-) relations. The idea is that nested relations serve asahigh- level, abstract description of internal storage structures. It was shown in [Scholl et al., 1987] that nested relations can in fact be used to model all schema-driven clustering strategies. That is, all storage schemes that are described by static information. For example, a physical design that stores all employee records adjacent to \their" department recordisaschema-driven strategy (which is naturally represented by a nested department relation with an employee subrelation). In contrast, a physical design that locates employee records either with \their" department record or with \their" manager record, depending on which of these related records was \current" at the time of the creation of the employee record, represents a dynamic clustering strategy, which is partially, but not completely, schema-driven. An example of a fully dynamic, schema-independent clustering strategy is \always append at the end of the database". Schema-driven clustering techniques are 1 Darmstadt Database Kernel System

4 both, practically important (they resemble but largely extend the current state-of-the-art) and theoretically interesting (they give rise to powerful algebraic query optimizations). The latter was shown in [Scholl et al., 1987 Scholl, 1986] in a context where (at) relational schemas were internally represented as nested relations. Both, the transformations of the structures and that of the operations can be expressed in a nested relational algebra, so query optimization can mostly operate on an algebraic level. In this paper, we discuss the problem of physical database design that is, given a logical DB schema and a transaction load, we want todetermine what the internal DB layout with the least overall cost of transaction execution is in the context of DASDBS as the storage manager. Therefore, all relational database designs are subsumed by this approach, while more exibility is introduced by taking hierarchical clustering strategies into account. For example, tuples of two \related" tables might as well be stored together in one nested relational tuple, containing for each tuple of the rst table all the \matching" tuples from the second as a \subrelation". The intuition behind such an organization is that the larger nested tuple is stored consecutively (that is, together with its subtuples) in one or as few as possible page(s) on disk (see [Paul et al., 1987 Schek et al., 1990 Deppisch et al., 1987] for details). Other options to accelerate the execution of \implicit" or \functional joins" include \link elds", that is, references (in the form of object identiers (OIDs) or addresses) to objects, that may be stored together with the referencing object tuple or separately from it. We will show that, with the extended nested relational interface of the DASDBS storage manager (the extension consists in the availability of physical tuple addresses), we can express a wide variety of physical design alternatives. Furthermore, the high-level (i.e., relational) description of these choices allows the query optimizer to apply algebraic transformations in order to exploit these storage structures when mapping logical level query expressions to the physical level. The placement trees of O 2 [Benzaken, 1990] pursue a similar purpose: objects that are related via super- and subobject relationships or via methods can be clustered hierarchically by dening appropriate placement trees. A placement tree is a hierarchical structure whose nodes are O 2 -classes. Upon generation of a new object of some class, the system searches for placement trees that contain the object's class and places the new object on the same page as its parent object, if such a placement tree is found and if the new object is related to an instance of the parent class in that tree. Multiple placement

5 trees may contain the same class, in which case the algorithm for determining the storage location of a newly generated object becomes more complex. In contrast to our approach, placement trees are more a dynamic clustering strategy, because the clustering strategy expressed there is not guaranteed, it is used as a \guideline" rather. In our approach, the denition of a certain clustering strategy (in terms of a nested relational storage relation) precisely determines where object tuples will be stored. While being less exible on the one hand, our approach has the advantage that the query processor can rely on the information represented in the storage hierarchies, whereas the O 2 optimizer makes no use of this information (yet?). Rather, O 2 expects performance gains to result from increased hit ratios in the buer pool. The paper is structured as follows: Section 1.2 describes our notation for the object model as well as the nested relational description of storage clusters. Section 1.3 presents the alternatives that are oered when mapping object schemas to DASDBS, and discusses their pros and cons in terms of which operations benet and which incur extra costs. In Section 1.4 we describe a rst physical database design tool that we have implemented to select a good (ideally the best) database layout for a given object schema and load description. Some remarks about query optimization are given in Section 1.6 together with a summary and outline of future work. 1.2 Notation and Terminology Before entering the technical exposition, we introduce the notation and terminology used throughout this paper. We rst describe the COCOON object model used at the logical level for the schema description then we set up the framework for the physical level, the extended nested relations available at the interface of the DASDBS storage manager The COCOON Object Model The is no universal consensus on a specic object-oriented data model (OODM), however, many of the features in any of the proposals seem to approach a mature state. For example, support for complex structures, including shared subobjects, and some form of inheritance hierachies. Like many others, we have contributed to the eld by proposing one such model, called COCOON [Scholl and Schek, 1990b Scholl and Schek, 1990a]. In this paper, we do not depend heavily on the particular avor of the OODM used for the

6 logical database schema, so we use our notation and terminology mostly because we are most familiar with this model. For the reader, however, it should be straightforward to translate into his or her model of choice. COCOON is a so-called \object-function" model, as is IRIS [Wilkinson et al., 1990], for example. This means, objects are pure abstractions, in the sense of the well-known abstract data type (ADT) approach. Particularly, none of the descriptive information \associated with" an object is considered to be \part of" the object in any sense. Rather, functions (or methods) are used as the uniform abstraction of stored elds, computed attributes, and relationships. In an even more general interpretation, \functions" can also be taken as an abstraction of retrieval and update methods, that is, the ADTspecic operators. Throughout the rest of the paper, we do not consider functions with side-eects, that is, update methods. Therefore, the term \function" refers to type-specic operators without side-eects. Intentionally, we do not distinguish between stored and computed or derived functions here, considering it as a higher level of data independence to hide this distinction from the logical database schema. In order express general relationships, functions may be set-valued. Furthermore, two functions may be dened as being inverses of each other. In terms of other OODMs that do distinguish between attributes (stored) or instance variables on the one hand and methods on the other, just think of database schemas where all attributes are hidden (encapsulated) behind access functions (retrieval methods). The point behind our more abstract view is that we want to leave it up to the process of physical database design to make the decisions on what to store and what to derive. Of course, there are restrictions to these decisions, so we can mainly decide to materialize derived functions, trading update eort for retrieval speed. In COCOON, like in most other OODMs, objects are instances of types, which are arranged in an inheritance hierarchy (actually, due to multiple inheritance, this is not a strict hierarchy). A type describes the set of functions that can be applied to its instances. COCOON's query language is strongly typed, that is, a type checker (statically) guarantees that only type-valid expressions are ever executed. The subtype hierarchy essentially represents the superset relationship between the sets of functions dened on the subtype as compared to its supertype(s). A subtype inherits all the functions from all its supertypes and adds new functions. Also, all instances of a subtype are also instances of the supertype(s). A less common characteristic of the COCOON model is its separation

7 between types and classes. Atype describes the common interface of all instances, whereas a class represents a collection of objects of a given type (or subtypes thereof). Therefore, we can have more than one collection of a given type, for example distinguished by dierent membership predicates. Each class, C, ischaracterized by two properties: the type of its member objects, mtype(c), and its current set of member objects, extent(c). Classes can also be arranged in a (non-strict) hierarchy, representing the subset relationships between their extents, that is, the extent of a subclass is necessarily a subset of the extents of all its superclasses. 2 Other OODMs that do not distinguish between types and classes would map into COCOON by dening exactly one class per type. Notice that O 2, for example, uses both terms, however, with dierent semantics: O 2 -types are data-types, O 2 -classes are object-types. The O 2 -clause \with extent" indicates that an explicit extent shouldbekept for that particular O 2 -class. (In our terminology: O 2 -classes are types, and \with extent" denes a class with the same name as the type.) Example 1 [Logical DB Schema] In the following discussions we will refer back to this example database as the logical level DB schema. The database contains information about companies, employees and cities [Scholl and Schek, 1990a]. define database SampleDB define type city = name : string, zip : string, pop : integer, has_comp : set of company inverse location define type person = name : string, bdate : date, 2 An analysis of object algebra operators, such as selection and projection, shows that the separation of types and classes is necessary in order to dene \object-preserving" queries, because the subtype relationship between types and the subset relationship between classes need not always correspond to each other [Scholl and Schek, 1990a Heuer and Scholl, 1991 Beeri, 1990]. Details of this aspect, however, are not relevant for the purpose of this paper. Just keep in mind that we want to allow the maintenance of more than one \type extent".

8 addr : city define type company = name : string, budg : integer, loc : set of city inverse has_comp, pres : chief, staff : set of employee inverse works_for define type employee isa person = hired : date, ssec : integer, sal : integer, works_for : company inverse staff define class City : city define class Pers : person define class Comp : company define class Empl : employee some Pers end. The phrase \some Pers" in the denition of class \Empl" states that (i) Empl is a subclass (i.e., subset) of Pers, and (ii) that inclusion of person objects in the subclass has to be specied explicitly by the user. (In contrast, \all Pers where P" would dene a class whose members are automatically determined from the superclass and the predicate P.) Nested Relations as a Description of Storage Structures In this section we introduce our notation for physical database designs. We use nested relations to describe the physical clustering strategy on disk blocks. That is, if we say that data from the conceptual database schema is stored in a particular nested relation on the physical level, we assumethatthe nested tuples are directly mapped to disk blocks in a depth-rst fashion: for each nesting level, we will rst nd all the atomic attribute values followed by the representations of all tuples of the rst subrelation, then followed by the representations of all tuples of the second subrelations, and so on, recursively,

9 until no more subrelations exist. This way, one nested tuple is implemented on as few pages as possible. Furthermore, this implementation gives ecient access to complete nested tuples as well as to parts thereof. For the latter to work, the storage manager has to keep structural information that helps guring out which pages belonging to a nested tuple actually havetobereadin from disk in order to process a given request. One implementationtechnique for this purpose, the one used by DASDBS, is described in detail in [Paul et al., 1987 Deppisch et al., 1987]. As usual, we denote the schema of a nested relation by recursively giving the name of the (sub-) relation followed by a list of attributes enclosed in parentheses: Dept(dno dname budget Empl(eno ename salary))isatwolevel nested relation Dept with three atomic attributes and one subrelation, Empl, that itself has three atomic attributes. In the sequel, when talking about relations (on the physical level), we always mean \relations used to store some information about objects" (object-relations). In order to describe physical level nested relations we introduce the following additional for a given relation R denotes the physical address of R-tuples, e.g. tuple identiers (TID). We can either as a (virtual) attribute of relation R, or as a (stored) attribute in any other relation, S (not necessarily distinct from R), in order to describe the fact that the stored S-tuples contain a physical reference to an R-tuple (a \link eld"). #R for a given relation R, representing a conceptual object type R, denotes the unique object identier (OID). This is a stored value used to represent the object itself, a surrogate value that is given to the object by the system upon creation. Notice that we do not a priori assume that we use the physical address of the tuple representing an object as the object's identier. Given that we use tuple identiers (TIDs) as the physical addressing scheme, we could have chosen to do so, since TIDs are guaranteed to be stable. Let us explain why wechose to separate the issues. First, even with TIDs as the phyiscal adressing scheme, it might be necessary to have logical OIDs that are never (not even in case of DB reorganizations) changed or reused for example, if OIDs are given to users for some reasons (we do not do this!. Second, and more importantly, we do not exclude redundant storage schemes, where objects may be represented in more than one tuple. This might be useful as a \decomposed" storage strategy, for example, when dierent object properties have very inhomogeneous

10 access frequencies. Furthermore, we may gain overalll performance from object replication. In the latter case, we certainly need a unique OID that is independent from physical tuple addresses. The use of nested relations to describe a wide varietyofphysical database designs has been discussed extensively in the context of at relations as the conceptual model in [Scholl et al., 1987]. Essentially, the choices that we have now, for an object-oriented model at the conceptual level, are largely the same. A relational schema together with the key-to-foreign-key relationships might be considered a \Complex Object" schema, without generalization. Therefore, we repeat the basic ideas here using a few examples. Example 2 [Physical storage structures desribed with nested relations] A logical \relationship" can be supported in a variety ofways at the physical level, ranging from no particular support, via several kinds of indexes or link elds, through physical neighborhood of related tuples (clustering). Some of these, particularly clustering, can only be applied to 1 : n-relationships without incorporating redundancy. Basically, all the options for n : m-relationships can be tracked down to a specic choice for any ofthetwo hierarchical directions embodied in the n : m-relationship. Assume two relations R and S are related through some predicate P (which might just be equality on a common attribute, some more complex condition, a function in the COCOON object model, or whatever). Physical database design may provide No specic support: Relations R and S are stored separately: R (#R... some attributes...) S (#S... some attributes...) Upon retrieval, both have to be traversed (possibly in a nested loops fashion) in order to evaluate the predicate P on every pair of R- and S-tuples. (Alternatively, an index could be used, if present. Here and in the following, we do not take indexes into account, since this is an orthogonal issue. Indexes can be useful in all the designs discussed here.) An embedded reference: Assuming that each R-tuple is related to at most one S-tuple, we could store the address of that S-tuple with each R-tuple. That is, upon insertion of the R-tuple, we evaluate

11 the predicate P on relation S for that R-tuple, nd the matching S-tuple (if any), and store its physical address (@S), or its OID (#S), or both, in the R-tuple: S (#S... some attributes...) and R (#R #S... some attributes...) or R some attributes...) or R (#R some attributes...) In the case where the predicate P is actually a function relating objects on the conceptual level, we will have to store at least the OID of the referenced object. An embedded reference set: Similarly,if an R-tuple can be related to more than one S-tuple, we can store a whole set of references (OIDs and/or TIDs) within each R-tuple, in a nested subrelation, SRef (for \references to S"): 3 S (#S... some attributes...) and R (#R SRef(#S)... some attributes...) or R (#R SRef(@S)... some attributes...) or R (#R some attributes...) This storage structure corresponds to CODASYL pointer arrays. A \join index": The pointers linking related objects could also be stored separately from the object-tuples, thus resembling the idea of join indices [Valduriez, 1987]: R (#R... some attributes...) S (#S... some attributes...) plus JI 1 (@R SRef(@S)) and JI 2 (@S RRef(@R)) Notice that we have grouped (nested) one set of addresses in each of the two parts of the join index. Furthermore, we can be more 3 the notation \SRef(...)" inside the schema of relation R denotes a subrelation with name \SRef" and subattributes \(...)"

12 exible in that physical addresses (TIDs) can be accompanied by or replaced with OIDs, independently form each other in any of the two parts of the join index. Physical clustering: The strongest way of supporting fast access \along" the predicate P is to physically cluster related R and S tuples. This is only possible without replication if it is a 1 : n-relationship, though: R (#R... R-attributes... S(#S... S-attributes...)) In this storage scheme, all S-tuples related to a given R-tuple are stored within that R-tuple, so no extra I/O is necessary once we have the R-tuple. Given a logical database schema, it is the task of the physical database design process to select one of these choices for each \relationship" between objects in the logical schema. The choice is based on cost estimates for all types of operations on all the dierent storage structures. Heuristics (such as an experienced DBA or even some automated tool) can be used to nd a good design for a given transaction load (see Section 1.4). 3 The choices indicated in the example above all refer to the implementation of \relationships" between objects, such as via functions in COCOON. Decisions have to be taken for other choices too, for example, how to implement the inheritance hierarchy, and how to deal with computed functions. We present the alternatives considered in our context next. 1.3 Alternatives for Physical DB Design This section presents the alternatives for mapping object-oriented database schemas from the conceptual level to nested relations at the physical level. We proceed by stepping through the basic concepts of the COCOON object model, and showing the implementation choices. Since the choices for each of the concepts combine orthogonally, a large decision space is spaned that is later on investigated by the physical database design tool (next section).

13 1.3.1 Implementing Objects According to the object-function paradigm of COCOON, an object itself is suciently implemented by a unique identier (OID), which is generated by the system. All data related to an object in one way or the other will refer to this identier (see below). Following the conventions set up above, we denote, for each object type T, attributes of internal relations containing the OID of objects of type T by #T Implementing Functions In COCOON, functions are the basic way of associating information (data values or other objects) to objects. In principle, we can think of each function being implemented as a binary relation, with one attribute for the argument OID and the other for the result value (data item or OID). In the case of set-valued functions the second attribute will actually be a subrelation of unary subtuples, containing one result (OID or data value) each. So, in principle, a single-valued function f s : T 1! T 2 andamulti-valued function f m : T 1! set(t 3 ) could be implemented by two binary relations: f s (#T 1 #T 2 ) f m (#T 1 T 3 Ref(#T 3 )) There are some obvious choices (such as: Do we really store each function in a separate binary relation or do we combine several of them into a \wider" relation?), and also some more subtle alternatives (such as: Shall we include physical pointers?). The decision space as far as function implementations are concerned includes the following alternatives in our current approach: Bundled vs. Decoupled: Each function f dened on a given domain object type T might either be stored in a separate (binary) relation f as shown above: we call this the decoupled mode. Alternatively, we can bundle the function f (possibly with other functions) together with the relation T implementing the type T (see below). Notice the restriction: the set of all functions dened on the same domain type are partitioned into bundled functions (that are all stored together in one internal relation) and decoupled functions (that are all stored in separate tables, one each). More exible function partitioning schemes are possible and certainly useful. However, we currently limit our optimization process to the restricted choice for tractability reasons.

14 In the example above, the bundled implementation of both, f s and f m would yield the following type table for T 1 : T 1 (#T 1 f s # f m #Set(#T 3 )) Notice the naming convention: attributes are named after the function they implement, a sux \#" indicates a (logical, OID) reference, a sux \Set" indicates a multi-valued function (a subrelation). Logical vs. Physical Reference: A function returning a (set of) object(s), not (a) data value(s), can be implemented by storing just OIDs (logical reference) of result objects or by including a TID (physical reference) as well. In the latter case, relations for single-valued functions become 3-attribute relations, those for multi-valued functions now have pairs in the subrelation. Continuing on the example above (bundled), inclusion of physical references for both, f s and f m,would result in: T 1 (#T 1 f s # f f m #Set(#T 3 )) Oneway vs. Bothway References: A function f from type T to (possibly a set of) type S can be implemented by a forward reference only (oneway), or it can be implemented with backpointers (bothway). Again, backward pointers, if any, can be implemented with just logical references or with physical references. Notice that COCOON includes the specication of inverse functions. Ifaninverse functions is dened in the conceptual schema, then the \backward" reference is present anyway. Therefore, this option is only considered for functions that have no inverse in the object schema. 4 Whenever the inverse function is not given explicitly in the schema, we have to assume that back references are multi-valued. For decoupled functions, the backward references will also be decoupled. Therefore, decoupled functions with backpointers result in the \join indices" shown in the previous section. For bundled functions, back references are also bundled with the corresponding type table. In our (bundled) example above, assuming a (logical only) backpointer for function f s would make the type table for type T 2 look like: T 2 (#T 2 f ;1 s #... other attributes...) 4 Backpointers might be useful, because COCOON's query language allows traversing functions backwards even if the inverse is not given explicitly in the schema.

15 Reference vs. Materialized: Functions returning (sets of) objects, not data values, can be implemented by the various forms of references discussed up to now. Alternatively, however, we can directly materialize the object-tuple(s) representing the result object(s) within the object-tuple representing the argument tuple. That is, we can store the resulting object-tuple \in-place". This strategy achieves physical clustering. In our example, the decision to materialize the function f m would generate a nested type table for T 1 that contains the type table for T 3 as a subrelation: T 1 (#T 1 f s # f f m Set(#T 3... other T 3 -attributes...)) Obviously, we need no backward references in this case. Furthermore, this alternative is free of redundancy only if the materialized function is 1: n, that is, its inverse is single-valued. As shown in the example, materialization is considered only in conjunction with bundling in our current optimizer. More generally, it may be optimal to materialize decoupled functions as well. Then we would actually partition the objects of the result type according to this function. Computed vs. Materialized: Finally, an additional option is to materialize derived (computed) functions. Assuming that some function f on type T can be computed, we could nonetheless decide to internally materialize it, if retrieval on f dominates updates to the underlying base information signicantly. The more retrieval dominates updates, and the more costly the computation is, the more likely is the case that materialization pays o. For example, with geometric object descriptions, one typically uses a \bounding box" function to lter objects coarsely in spatial queries. Obviously, the bounding box is derived from the actual geometry of objects. But computing the bounding box incurs quite some eort, and if object shapes rarely change, materializing the bounding box function clearly is a good strategy (see also [Kemper and Moerkotte, 1990a Kemper and Moerkotte, 1990b]). Let us repeat that choosing how to implement functions (retrieval methods) for an object-oriented database schema is essentially the same problem as physical database design for network (CODASYL) databases, or for \Complex Object" databases (in the sense of [Abiteboul and Beeri, 1988 Abiteboul et al., 1989]), or even for relational databases (where the `structure' stems from key-foreign key relationships).

16 1.3.3 Implementing Types, Classes, and Inheritance Some new aspects in physical database design, however, originate from data modelling concepts not typically found in `pre-object-oriented" models: inheritance hierarchies. In the context of the COCOON object model, we are dealing with two such hierarchies: one between types (organizing structural, function inheritance), and one between classes (organizing set inclusion). Before going into the details of these hierachies, there is one more basic question to be answered: how to implement types and classes. In general, our approach to physical design is schema-driven (as opposed to fully dynamic or instance-driven). That is, we analyze and optimize the physical DB layout based on schema-level information (types, classes, functions) rather than for individual objects. In our model, this raises the question whether we do the design for types of objects, or for individual classes. Since classes are always bound to a particular (member-) type, physical design for types is the larger grain approach, whereas design for individual classes would be the ner grain approach (remember, there may be more than one class per type). Currently, we do the physical design on a type basis, that is, all objects of a given type are physically represented in the same way(even if they belong to several classes). The argument for doing so is that it is easier, because it gives fewer choices. Furthermore, if typical database schemas have roughly one class per type, the dierence as compared to a class-based physical design is only marginal. Classes are implemented as views over their underlying type table. If classes are dened by a predicate, this predicate is used as a selection condition, user-dened classes (whose members are explicitly added/removed by query language operators) require an additional boolean attribute in the type table. The inheritance hierarchy for types introduces two further degrees of freedom for physical design: rst, if functions are bundled, shall we include inherited functions in the type tables of subtypes? Second, shall objects be represented in an object-tuple only for the most specic subtype's table, or in several object-tuples, one per supertype? These choices have sometimes been called horizontal versus vertical partitioning of objects or properties. Currently, we allow only very limited choices with respect to types, classes, and inheritance: Types: Each objecttype T is mapped to a type table T with at least one attribute, #T, containing the OID. Additional attributes are present in case of any bundled functions and/or materializations of object functions. The type table T may itself be a subrelation of some other table

17 S, if type T was materialized w.r.t. a function returning T -objects. (In the latter case, a dummy object tuple has to be added to S that collects T -objects not related to any S-object, that is, if the function used for materialization is not onto T.) Classes: Each class C is implemented as a view over its underlying type table. If the class is dened by a predicate (\all"-classes and views in COCOON), this predicate is used as the selection condition. If the class is dened to include manually added member objects (\some"-classes in COCOON, see [Scholl and Schek, 1990a Scholl et al., 1991]), the underlying type table is extended by a boolean attribute C that is set to true if and only if the object is a member of this class C. Inheritance: Subtyping is implemented by having one type table per subtype. Two possibilities are considered: an object-tuple is included in each supertype's table. In case there are any bundled or materialized functions, these are not repeated in the subtypes' tables. In this case, object-tuple in subtype tables might optionally include physical references to supertype tuples. Using this option, physical references to object tuples always point to an object tuple in a specic type table. So, there is yet another degree of freedom: which one to point to? When choosing this option, we always point to the object tuple in the type table that implements the range type of the function under consideration. an object-tuple is included in only one type's table, that of the most specic subtype. In case of any bundled or materialized functions in supertypes, these are also included in the subtype's table. Therefore, function values are never kept redundantly, while OIDs may be replicated in all supertypes. Subclassing is implemented without redundancy, because all classes are views anyway. Future plans include the consideration of classes instead of types as the basis for physical design, and potential redundancy with respect to inheritance.

18 1.3.4 Indexes Obviously, among the most important decisions that have to made during physical database design for any database is the selection of appropriate indexes. All the classical indexing techniques, such asb + -trees, will be considered. Furthermore, several specialized index structures have been proposed that are designed to support OODB-specic kinds of operations, such as path traversals [Bertino, 1990]. In order to evaluate the advantages and costs of using a complex record (DASDBS) instead of a at record (RDBMS) storage manager, though, indexes play only a supporting role. The main emphasis is on the eects of hierarchical clustering and embedded references. Therefore, we do currently not consider index selection. In the future, we plan to take indexes into account, particularly because DASDBS allows the implementation of very powerful index structures, such as nested or path indexes The Default Physical Design In order to have a starting point for both the physical database design tool described in the next section, and the implementation of COCOON on top of DASDBS, we have identied a default physical design that includes the following choice of implementation strategies: Functions: All functions are bundled with their type table, so as to cluster all object properties together. Object-valued functions are implemented as references (potentially shared subobjects), with physical references and backpointers (ecient access, also for inverse direction). Multi-valued functions become subrelations. Inheritance: Objects are present in all supertype tables (ecient access to all instances at all levels), inherited functions are not repeated in subtype tables (non-redundant storage scheme). No backpointers to supertype tuples are included (fast access via index is assumed). For the conceptual schema given in Example 1, the default physical design would be (a backpointer for a function f is called f ;1..., names of set-valued functions are suxed by \...Set"): City Comp ; #City pop name zip has compset(comp# Comp@) addr ;1 Set(Pers# P ers@) ; #Comp name budg pres# pres@

19 Pers Empl locset(city# staffset(emp# ; #P ers name bdate addr# ; #Empl sal hired ssno works for# works pres ;1 Set(Comp# Notice that an employee object's OID (Empl#) is actually the same as the corresponding person object's OID (Pers#). 1.4 A Physical Design Tool In this section, we present a preliminary physical database design tool that considers some of the alternatives above and produces an internal nested relational schema for use with DASDBS, derived from a conceptual COCOON schema, a load description, and a cost model General Approach One of the main objectives of the COCOON project is to investigate the architecture of OODBMSs. Therefore, three implementation platforms are currently being used: a commercial relational DBMS (Oracle), a commercial OODBMS (Ontos), and the DASDBS prototype. The relational and the nested relational \storage managers" will be used for extensive performance experiments to evaluate the pros and cons of the dierent storage alternatives. While the physical design alternatives presented above were developed mainly for implementation on top of DASDBS, some of them can be mapped to Oracle, too. For example, nested relations with only two levels of nesting (i.e., all subrelations are at) can be simulated quite exactly by means of Oracle's \Clusters" [ORACLE, 1990]. In order to assist the DBA in selecting a good physical design for a given conceptual database schema and anticipated transaction load (which may be estimated, observed, or \guesstimated"), we have implemented a rst prototype of a physical database design tool (DBDesigner). The system was developed in one master thesis [Gross, 1991] and is implemented in PROLOG. It uses a simplied version of a cost model for DASDBS operations developed earlier [Brauburger and Deuer, 1987 Paul, 1988].

20 1.4.2 Load Description The transaction load is given to DBDesigner as a collection of abstractions of COCOON operations together with their frequencies. That is, a load description is a collection of entries of the form /f/operation-specification The operation specications consist of the following: For selections we record what the attributes in the predicate are, and the general form of the predicate. The specic predicate used is not included. Furthermore, the estimated selectivity of the predicate is also given in the load description (as an absolute cardinality orasarelative fraction). As an example, consider the following two entries: /100/select/0.3/[(name(manufacturer)) mul (name(owner))] (Vehicle) /30/select/150/[address rel location(works_for)](employee) The rst entry states that 100 times a selection of Vehicles returning 30% of all member objects of that class is issued. The predicate involves names of manufacturers and names of owners, these two parts are conjunctively (\mul") combined. The second entry indicates that 30 selections on the Employee class are issued that return about 150 objects each. The predicate compares the address of the Employee with the location of his or her employer (using a set comparator, as indicated by \rel"). For projections, of which COCOON's query language actually has two forms, an object-preserving operator \project" and a tuple-generating operator \extract", we record only extract, since project is a typecast operation that is \executed" completely at compile-time (used for type checking purposes). Since extract's can be nested to produce nested sets of tuples, the load description for extracts records the \path traversals" that are performed by these operations. For example, if 50 extracts of Company data together with (nested) Employee data are contained in the mix, the load description will have the following line:

21 /50/extract[cname,budget,extract[name,salary](staff)] (Company) This basically conveys the information, what parts of the accessed objects are read for output. For extend operations that dene new derived functions and can also be used to simulate joins (see [Scholl and Schek, 1990a]), the load description currently contains no entries. The query compiler will substitute the dening expressions for the function names, so in the rst DBDesigner prototype we expect this substitution to be done before the load description. Set operations, such as union, dierence, and intersection, are currently excluded from the load description. The reason is mainly to reduce the problem space (and also, that we expect them to be less frequent and crucial). For update operations, the main information is frequencies and what functions get updated. The following three lines are used to describe two update operations that occur 30 and 4 times, respectively. /30/update[produces := select/430/[id](vehicle)](company) /4/newemployer := select/1/[name](company) /4/update[works_for := newemployer](employee) The rst update sets the `produces' function to a new value for a Company. The new value for `produces' is obtained by a query (selection) against the Vehicle class (that returns 430 objects on the average). The second update proceeds in two steps: rst, a variable is assigned the result of a selection on Companies (returning only 1 object), then this Company is made the new value of the `works for' function of an Employee object. Notice that frequency information for update statements can be interpreted in two ways: either single object updates are performed that often, or that many objects are updated at once. For the cost calculation there is no dierence Statistical Information In order to compute the cost of operations, the design optimizer should actually cooperate with the query optimizer of the execution engine. In the

22 current development phase of the COCOON{DASDBS mapping, however, this part is not yet completed (see Section 1.5). Therefore, DBDesigner uses its own set of statistics and cost formulae. The statistics used by physical designer are: cardinalities of types (how many objects of that type are in the DB?) 5 (average) cardinalities of set-valued functions (average) sizes of all atomic values No information about value distributions is needed, since selectivities are included in the load description The Optimization Process The optimization algorithm of DBDesigner uses a branch-and-bound method. From the given load description we rst generate a \transaction graph" (TG) that will be used for enumerating the design alternatives. The TG consists of vertices representing the types in the concpetual database schema, directed edges connect types if there is a \traversal" in the corresponding direction in the load. Traversals can, for example, occur in selection conditions: whenever a selection on a class over type T 1 uses a function that returns objects of type T 2, there will be an edge from node T 1 to node T 2 in the TG. Other possibilities for such traversal are projections (extract) and `information ow' in update statements (where do the values assigned come from?). Finally, use of inherited functions also leads to an edge from the subtype node to the supertype node. The next step is to add a weight to the edges in the TG. This weight represents the accumulated trac across this link, that is, from the load description we computes the sum of the frequencies of all operations that incur the traversal represented by the edge. The current version of DBDesigner has the following restriction: usually, multiple connections may be present in the conceptual schema between two types. For example, several functions might connect two types. This is not permitted in schemas that can be optimized by DBDesigner. As a consequence, if the conceptual schema does contain such cases, the range type of such functions has to be specialized into subtypes, such that the functions map objects of the domain type to dierent range types. 5 Accumulation for supertypes is done by the tool, based on the assumption that dierent subtypes of one type have disjoint extents.

23 After the construction of the TG, the optimization can actually begin. Starting from the default physical design (see above), the optimizer selects the most promising (that is, heaviest) edge from the TG and tries to improve performance by materializing the corresponding function (physical clustering) or by repeating the inherited attributes in the subtype's object-table. The total cost of the new design is compared with the old cost. The next step depends on how these costs compare. If the transformation of the physical design led to an improvement, we proceed with this design, otherwise we try the 2nd heaviest edge, and so on. The search isalways continued at the currently best alternative, as long as it still contains some immediate potential for improvement. An alternative has no immediate potential, if either no more transformations can be applied, or all possible transformations have already been tested, but they have all led to no improvement. Notice that an alternative without immediate potential could still lead to the optimal solution. So, immediate potential is only used as the criterion where to continue the search next. If no alternatives with immediate potential are left, we continue with the best alternative that allows transformations until no more transformations are possible. The search can further be limited by the user by giving a maximum number of nal designs to evaluate. A design is nal if it has no immediate potential Experiences and Extensions We have tested DBDesigner with a couple of (rather small) sample databases, with only a modest complexity in the transaction load. Larger scale experiments are planned, but have not yet been carried out. With the small test cases, as expected, the performance was good and PROLOG has not (yet?) turned out to be a big penalty. The rst prototype has already been extended to allow dynamic modications in either the transaction load or the database schema. The objective here is to avoid complete re-iteration of the optimization process, for two reasons: one is to avoid duplicate work. The other, more challenging one is that the new physical design should not be too dierent from the old one in case we already have a big populated database. Otherwise we would have to reorganize the existing data. Particular emphasis was put on the inclusion of view denitions in the schema description. For this specic case we

24 have added redundant storage strategies to the optimizer's repertoire: a view can be materialized in a separate internal relation or just kept as a virtual (computed) class. 1.5 Query Optimization In this section we briey discuss the transformation and optimization of queries that are given to the system in terms of the conceptual database schema. It is the task of the query optimizer to map these COCOON queries down to the physical level by: (i) transforming them to the nested relational model and algebra as available at the DASDBS kernel interface, and (ii) selecting a good (if not the best) execution strategy. Because COCOON's query language, COOL, is pretty similar to a nested relational algebra, a straightforward transformation from COOL expressions down to a nested relational algebra expression against a xed implementation at the internal level (e.g., the default physical design) is done rather easily. Complications arise from the fact that the mapping of data structures is quite exible, and that, depending on the chosen design, operations have to be optimized substantially. Originally,we planed to investigate two competitive approaches to query transformation and optimization. The rst would have been a purely algebraic one, comparable to what we did with the relational to nested relational mapping [Scholl et al., 1987 Scholl, 1986]: COCOON classes would be dened as `views' over the stored nested relations, COOL queries would be transformed to the nested relational level by `view substitution', and nally, algebraic transformations within the nested relational algebra could be applied, so as to eliminate redundant subexpressions. Quite a few redundant joins would have to be removed in case we materialized functions (hierachical clustering). This has exactly been the problem addressed in [Scholl, 1986]. The second approach is to transform the given COOL query into a nested algebra query using a class connection graph, similar to the one used in [Lanzelotte et al., 1991]. The class connection graph is somewhat similar to the transaction graph used in DBDesigner, edges are labelled by the implementation strategy. For example, whether a pointer (in case of a physical reference) has to be followed, whether a join has to be performed (in case of a logical reference), or whether a subrelation has to be accessed (in case of a materialized function), is represented by corresponding labels. Each label

Physical Design in OODBMS

Physical Design in OODBMS 1 Physical Design in OODBMS Dieter Gluche and Marc H. Scholl University of Konstanz Department of Mathematics and Computer Science P.O.Box 5560/D188, D-78434 Konstanz, Germany E-mail: {Dieter.Gluche, Marc.Scholl}@Uni-Konstanz.de

More information

The COCOON Object Model. M.H. Scholl, C. Laasch, C. Rich, H.-J. Schek, M. Tresch

The COCOON Object Model. M.H. Scholl, C. Laasch, C. Rich, H.-J. Schek, M. Tresch The COCOON Object Model M.H. Scholl, C. Laasch, C. Rich, H.-J. Schek, M. Tresch Foreword This report has long been awaited primarilyby us, the authors, but also by anumber of colleagues who kept asking

More information

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to: 14 Databases 14.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define a database and a database management system (DBMS)

More information

A Review of Database Schemas

A Review of Database Schemas A Review of Database Schemas Introduction The purpose of this note is to review the traditional set of schemas used in databases, particularly as regards how the conceptual schemas affect the design of

More information

Managing large sound databases using Mpeg7

Managing large sound databases using Mpeg7 Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

Introduction to Object-Oriented and Object-Relational Database Systems

Introduction to Object-Oriented and Object-Relational Database Systems , Professor Uppsala DataBase Laboratory Dept. of Information Technology http://www.csd.uu.se/~udbl Extended ER schema Introduction to Object-Oriented and Object-Relational Database Systems 1 Database Design

More information

Object-Oriented Databases

Object-Oriented Databases Object-Oriented Databases based on Fundamentals of Database Systems Elmasri and Navathe Acknowledgement: Fariborz Farahmand Minor corrections/modifications made by H. Hakimzadeh, 2005 1 Outline Overview

More information

Department of Computer and Information Science, Ohio State University. In Section 3, the concepts and structure of signature

Department of Computer and Information Science, Ohio State University. In Section 3, the concepts and structure of signature Proceedings of the 2nd International Computer Science Conference, Hong Kong, Dec. 1992, 616-622. 616 SIGNATURE FILE METHODS FOR INDEXING OBJECT-ORIENTED DATABASE SYSTEMS Wang-chien Lee and Dik L. Lee Department

More information

Graph Visualization U. Dogrusoz and G. Sander Tom Sawyer Software, 804 Hearst Avenue, Berkeley, CA 94710, USA info@tomsawyer.com Graph drawing, or layout, is the positioning of nodes (objects) and the

More information

CSC 742 Database Management Systems

CSC 742 Database Management Systems CSC 742 Database Management Systems Topic #4: Data Modeling Spring 2002 CSC 742: DBMS by Dr. Peng Ning 1 Phases of Database Design Requirement Collection/Analysis Functional Requirements Functional Analysis

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

Department of Computer Science, Information Systems { Databases. tions are the generalized abstraction of attributes

Department of Computer Science, Information Systems { Databases. tions are the generalized abstraction of attributes Implementing an Object Model on Top of Commercial Database Systems (Extended Abstract) Markus Tresch and Marc H. Scholl Department of Computer Science, Information Systems { Databases ETH Zurich, CH-8092

More information

Glossary of Object Oriented Terms

Glossary of Object Oriented Terms Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction

More information

Object Oriented Databases (OODBs) Relational and OO data models. Advantages and Disadvantages of OO as compared with relational

Object Oriented Databases (OODBs) Relational and OO data models. Advantages and Disadvantages of OO as compared with relational Object Oriented Databases (OODBs) Relational and OO data models. Advantages and Disadvantages of OO as compared with relational databases. 1 A Database of Students and Modules Student Student Number {PK}

More information

Lesson 8: Introduction to Databases E-R Data Modeling

Lesson 8: Introduction to Databases E-R Data Modeling Lesson 8: Introduction to Databases E-R Data Modeling Contents Introduction to Databases Abstraction, Schemas, and Views Data Models Database Management System (DBMS) Components Entity Relationship Data

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Modern Databases. Database Systems Lecture 18 Natasha Alechina Modern Databases Database Systems Lecture 18 Natasha Alechina In This Lecture Distributed DBs Web-based DBs Object Oriented DBs Semistructured Data and XML Multimedia DBs For more information Connolly

More information

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model The entity-relationship (E-R) model is a a data model in which information stored

More information

Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe

More information

The ObjectStore Database System. Charles Lamb Gordon Landis Jack Orenstein Dan Weinreb Slides based on those by Clint Morgan

The ObjectStore Database System. Charles Lamb Gordon Landis Jack Orenstein Dan Weinreb Slides based on those by Clint Morgan The ObjectStore Database System Charles Lamb Gordon Landis Jack Orenstein Dan Weinreb Slides based on those by Clint Morgan Overall Problem Impedance mismatch between application code and database code

More information

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè. CMPT-354-Han-95.3 Lecture Notes September 10, 1995 Chapter 1 Introduction 1.0 Database Management Systems 1. A database management system èdbmsè, or simply a database system èdbsè, consists of æ A collection

More information

Distributed Databases in a Nutshell

Distributed Databases in a Nutshell Distributed Databases in a Nutshell Marc Pouly Marc.Pouly@unifr.ch Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Part VI. Object-relational Data Models

Part VI. Object-relational Data Models Part VI Overview Object-relational Database Models Concepts of Object-relational Database Models Object-relational Features in Oracle10g Object-relational Database Models Object-relational Database Models

More information

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1 The Relational Model Ramakrishnan&Gehrke, Chapter 3 CS4320 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. Legacy systems in older models

More information

Clustering and scheduling maintenance tasks over time

Clustering and scheduling maintenance tasks over time Clustering and scheduling maintenance tasks over time Per Kreuger 2008-04-29 SICS Technical Report T2008:09 Abstract We report results on a maintenance scheduling problem. The problem consists of allocating

More information

Markus Tresch. Swiss Federal Institute of Technology (ETH) CH - 8092 Zurich, Switzerland. Email tresch@inf.ethz.ch. Abstract

Markus Tresch. Swiss Federal Institute of Technology (ETH) CH - 8092 Zurich, Switzerland. Email tresch@inf.ethz.ch. Abstract Technical Report No. 248, Department of Computer Science, ETH Zurich, July 1996. Principles of Distributed Object Languages Markus Tresch Institute for Information Systems Swiss Federal Institute of Technology

More information

Chapter 3: Distributed Database Design

Chapter 3: Distributed Database Design Chapter 3: Distributed Database Design Design problem Design strategies(top-down, bottom-up) Fragmentation Allocation and replication of fragments, optimality, heuristics Acknowledgements: I am indebted

More information

Chapter 7 Application Protocol Reference Architecture

Chapter 7 Application Protocol Reference Architecture Application Protocol Reference Architecture Chapter 7 Application Protocol Reference Architecture This chapter proposes an alternative reference architecture for application protocols. The proposed reference

More information

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Chapter 8 The Enhanced Entity- Relationship (EER) Model Chapter 8 The Enhanced Entity- Relationship (EER) Model Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Outline Subclasses, Superclasses, and Inheritance Specialization

More information

Conceptual Design Using the Entity-Relationship (ER) Model

Conceptual Design Using the Entity-Relationship (ER) Model Conceptual Design Using the Entity-Relationship (ER) Model Module 5, Lectures 1 and 2 Database Management Systems, R. Ramakrishnan 1 Overview of Database Design Conceptual design: (ER Model is used at

More information

A Componentware Methodology based on Process Patterns Klaus Bergner, Andreas Rausch Marc Sihling, Alexander Vilbig Institut fur Informatik Technische Universitat Munchen D-80290 Munchen http://www4.informatik.tu-muenchen.de

More information

Universal. Event. Product. Computer. 1 warehouse.

Universal. Event. Product. Computer. 1 warehouse. Dynamic multi-dimensional models for text warehouses Maria Zamr Bleyberg, Karthik Ganesh Computing and Information Sciences Department Kansas State University, Manhattan, KS, 66506 Abstract In this paper,

More information

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel 1 In this chapter, you will learn: Why data models are important About the basic data-modeling

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It

More information

types, but key declarations and constraints Similar CREATE X commands for other schema ëdrop X name" deletes the created element of beer VARCHARè20è,

types, but key declarations and constraints Similar CREATE X commands for other schema ëdrop X name deletes the created element of beer VARCHARè20è, Dening a Database Schema CREATE TABLE name èlist of elementsè. Principal elements are attributes and their types, but key declarations and constraints also appear. Similar CREATE X commands for other schema

More information

The Entity-Relationship Model

The Entity-Relationship Model The Entity-Relationship Model Chapter 2 Slides modified by Rasmus Pagh for Database Systems, Fall 2006 IT University of Copenhagen Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today

More information

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML? CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency

More information

[Refer Slide Time: 05:10]

[Refer Slide Time: 05:10] Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

2. Basic Relational Data Model

2. Basic Relational Data Model 2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

More information

Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model

Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Outline Using High-Level Conceptual Data Models for

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Object Oriented Design

Object Oriented Design Object Oriented Design Kenneth M. Anderson Lecture 20 CSCI 5828: Foundations of Software Engineering OO Design 1 Object-Oriented Design Traditional procedural systems separate data and procedures, and

More information

Jena Research Papers in Business and Economics

Jena Research Papers in Business and Economics Jena Research Papers in Business and Economics Solving symmetric mixed-model multilevel just-in-time scheduling problems Malte Fliedner, Nils Boysen, Armin Scholl 18/2008 Jenaer Schriften zur Wirtschaftswissenschaft

More information

2. Conceptual Modeling using the Entity-Relationship Model

2. Conceptual Modeling using the Entity-Relationship Model ECS-165A WQ 11 15 Contents 2. Conceptual Modeling using the Entity-Relationship Model Basic concepts: entities and entity types, attributes and keys, relationships and relationship types Entity-Relationship

More information

Data Modeling Basics

Data Modeling Basics Information Technology Standard Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: STD-INF003B STD Title: Data Modeling Basics Issued by: Deputy

More information

Overview RDBMS-ORDBMS- OODBMS

Overview RDBMS-ORDBMS- OODBMS Overview RDBMS-ORDBMS- OODBMS 1 Database Models Transition Hierarchical Data Model Network Data Model Relational Data Model ER Data Model Semantic Data Model Object-Relational DM Object-Oriented DM 2 Main

More information

THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E)

THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E) THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E) 2 LECTURE OUTLINE Using High-Level, Conceptual Data Models for Database Design Entity-Relationship (ER) model Popular high-level conceptual

More information

DATABASE MANAGEMENT SYSTEM

DATABASE MANAGEMENT SYSTEM REVIEW ARTICLE DATABASE MANAGEMENT SYSTEM Sweta Singh Assistant Professor, Faculty of Management Studies, BHU, Varanasi, India E-mail: sweta.v.singh27@gmail.com ABSTRACT Today, more than at any previous

More information

Network Model APPENDIXD. D.1 Basic Concepts

Network Model APPENDIXD. D.1 Basic Concepts APPENDIXD Network Model In the relational model, the data and the relationships among data are represented by a collection of tables. The network model differs from the relational model in that data are

More information

Ch3 Active Database Systems {1{ Applications of Active Rules Internal to the database: { Integrity constraint maintenance { Support of data derivation (including data replication). Extended functionalities:

More information

Portable Bushy Processing Trees for Join Queries

Portable Bushy Processing Trees for Join Queries Reihe Informatik 11 / 1996 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard Wolfgang Scheufele Guido Moerkotte 1 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard

More information

Java Application Developer Certificate Program Competencies

Java Application Developer Certificate Program Competencies Java Application Developer Certificate Program Competencies After completing the following units, you will be able to: Basic Programming Logic Explain the steps involved in the program development cycle

More information

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3 The Relational Model Chapter 3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase,

More information

Part 7: Object Oriented Databases

Part 7: Object Oriented Databases Part 7: Object Oriented Databases Junping Sun Database Systems 7-1 Database Model: Object Oriented Database Systems Data Model = Schema + Constraints + Relationships (Operations) A logical organization

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

virtual class local mappings semantically equivalent local classes ... Schema Integration

virtual class local mappings semantically equivalent local classes ... Schema Integration Data Integration Techniques based on Data Quality Aspects Michael Gertz Department of Computer Science University of California, Davis One Shields Avenue Davis, CA 95616, USA gertz@cs.ucdavis.edu Ingo

More information

Domain Model for Identity Management

Domain Model for Identity Management 1 2 Domain Model for Identity Management This document introduces entities and relationships common to the domain of identity management. Organization Group belongs-to member-of performs owns confers Account

More information

Concepts and terminology in the Simula Programming Language

Concepts and terminology in the Simula Programming Language Concepts and terminology in the Simula Programming Language An introduction for new readers of Simula literature Stein Krogdahl Department of Informatics University of Oslo, Norway April 2010 Introduction

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

Comp 411 Principles of Programming Languages Lecture 34 Semantics of OO Languages. Corky Cartwright Swarat Chaudhuri November 30, 20111

Comp 411 Principles of Programming Languages Lecture 34 Semantics of OO Languages. Corky Cartwright Swarat Chaudhuri November 30, 20111 Comp 411 Principles of Programming Languages Lecture 34 Semantics of OO Languages Corky Cartwright Swarat Chaudhuri November 30, 20111 Overview I In OO languages, data values (except for designated non-oo

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

Unit 2.1. Data Analysis 1 - V2.0 1. Data Analysis 1. Dr Gordon Russell, Copyright @ Napier University

Unit 2.1. Data Analysis 1 - V2.0 1. Data Analysis 1. Dr Gordon Russell, Copyright @ Napier University Data Analysis 1 Unit 2.1 Data Analysis 1 - V2.0 1 Entity Relationship Modelling Overview Database Analysis Life Cycle Components of an Entity Relationship Diagram What is a relationship? Entities, attributes,

More information

programming languages, programming language standards and compiler validation

programming languages, programming language standards and compiler validation Software Quality Issues when choosing a Programming Language C.J.Burgess Department of Computer Science, University of Bristol, Bristol, BS8 1TR, England Abstract For high quality software, an important

More information

Normalization in OODB Design

Normalization in OODB Design Normalization in OODB Design Byung S. Lee Graduate Programs in Software University of St. Thomas St. Paul, Minnesota bslee@stthomas.edu Abstract When we design an object-oriented database schema, we need

More information

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys Database Design Overview Conceptual Design. The Entity-Relationship (ER) Model CS430/630 Lecture 12 Conceptual design The Entity-Relationship (ER) Model, UML High-level, close to human thinking Semantic

More information

Completeness, Versatility, and Practicality in Role Based Administration

Completeness, Versatility, and Practicality in Role Based Administration Completeness, Versatility, and Practicality in Role Based Administration Slobodan Vukanović svuk002@ec.auckland.ac.nz Abstract Applying role based administration to role based access control systems has

More information

A single minimal complement for the c.e. degrees

A single minimal complement for the c.e. degrees A single minimal complement for the c.e. degrees Andrew Lewis Leeds University, April 2002 Abstract We show that there exists a single minimal (Turing) degree b < 0 s.t. for all c.e. degrees 0 < a < 0,

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

SECURITY MODELS FOR OBJECT-ORIENTED DATA BASES

SECURITY MODELS FOR OBJECT-ORIENTED DATA BASES 82-10-44 DATA SECURITY MANAGEMENT SECURITY MODELS FOR OBJECT-ORIENTED DATA BASES James Cannady INSIDE: BASICS OF DATA BASE SECURITY; Discretionary vs. Mandatory Access Control Policies; Securing a RDBMS

More information

The Volcano Optimizer Generator: Extensibility and Efficient Search

The Volcano Optimizer Generator: Extensibility and Efficient Search The Volcano Optimizer Generator: Extensibility and Efficient Search Goetz Graefe Portland State University graefe @ cs.pdx.edu Abstract Emerging database application domains demand not only new functionality

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

Foundations of Information Management

Foundations of Information Management Foundations of Information Management - WS 2012/13 - Juniorprofessor Alexander Markowetz Bonn Aachen International Center for Information Technology (B-IT) Data & Databases Data: Simple information Database:

More information

Base Table. (a) Conventional Relational Representation Basic DataIndex Representation. Basic DataIndex on RetFlag

Base Table. (a) Conventional Relational Representation Basic DataIndex Representation. Basic DataIndex on RetFlag Indexing and Compression in Data Warehouses Kiran B. Goyal Computer Science Dept. Indian Institute of Technology, Bombay kiran@cse.iitb.ernet.in Anindya Datta College of Computing. Georgia Institute of

More information

SYSTEMS AND SOFTWARE REQUIREMENTS SPECIFICATION (SSRS) TEMPLATE. Version A.4, January 2014 FOREWORD DOCUMENT CONVENTIONS

SYSTEMS AND SOFTWARE REQUIREMENTS SPECIFICATION (SSRS) TEMPLATE. Version A.4, January 2014 FOREWORD DOCUMENT CONVENTIONS SYSTEMS AND SOFTWARE REQUIREMENTS SPECIFICATION (SSRS) TEMPLATE Version A.4, January 2014 FOREWORD This document was written to provide software development projects with a template for generating a System

More information

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization 6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization Animals(name,age,species,cageno,keptby,feedtime) Keeper(id,name)

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

More information

Chapter 2: Entity-Relationship Model. Entity Sets. " Example: specific person, company, event, plant

Chapter 2: Entity-Relationship Model. Entity Sets.  Example: specific person, company, event, plant Chapter 2: Entity-Relationship Model! Entity Sets! Relationship Sets! Design Issues! Mapping Constraints! Keys! E-R Diagram! Extended E-R Features! Design of an E-R Database Schema! Reduction of an E-R

More information

2 Associating Facts with Time

2 Associating Facts with Time TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points

More information

Chapter 5: Overview of Query Processing

Chapter 5: Overview of Query Processing Chapter 5: Overview of Query Processing Query Processing Overview Query Optimization Distributed Query Processing Steps Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

Vector storage and access; algorithms in GIS. This is lecture 6

Vector storage and access; algorithms in GIS. This is lecture 6 Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector

More information

Thesis work and research project

Thesis work and research project Thesis work and research project Hélia Pouyllau, INRIA of Rennes, Campus Beaulieu 35042 Rennes, helia.pouyllau@irisa.fr July 16, 2007 1 Thesis work on Distributed algorithms for endto-end QoS contract

More information

Language. Johann Eder. Universitat Klagenfurt. Institut fur Informatik. Universiatsstr. 65. A-9020 Klagenfurt / AUSTRIA

Language. Johann Eder. Universitat Klagenfurt. Institut fur Informatik. Universiatsstr. 65. A-9020 Klagenfurt / AUSTRIA PLOP: A Polymorphic Logic Database Programming Language Johann Eder Universitat Klagenfurt Institut fur Informatik Universiatsstr. 65 A-9020 Klagenfurt / AUSTRIA February 12, 1993 Extended Abstract The

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Schemas Supporting Physical Data Storage

Schemas Supporting Physical Data Storage s Supporting Data Storage 21 st January 2014 (30 th March 2001) s Supporting Physical Data Storage Introduction A RAQUEL DB is made up of a DB, which itself consists of a set of schemas. These schemas

More information

History of Database Systems

History of Database Systems History of Database Systems By Kaushalya Dharmarathna(030087) Sandun Weerasinghe(040417) Early Manual System Before-1950s Data was stored as paper records. Lot of man power involved. Lot of time was wasted.

More information

Machine-Assisted Design of Business Process Models Using Descriptor Space Analysis

Machine-Assisted Design of Business Process Models Using Descriptor Space Analysis Machine-Assisted Design of Business Process Models Using Descriptor Space Analysis Maya Lincoln 1, Mati Golani 2, and Avigdor Gal 1 1 Technion - Israel Institute of Technology mayal@technion.ac.il, avigal@ie.technion.ac.il

More information

Exploiting the Functionality of Object-Oriented Database. Management Systems for Information Retrieval 1. Gabriele Sonnenberger

Exploiting the Functionality of Object-Oriented Database. Management Systems for Information Retrieval 1. Gabriele Sonnenberger Exploiting the Functionality of Object-Oriented Database Management Systems for Information Retrieval 1 Gabriele Sonnenberger Ubilab, Union Bank of Switzerland Bahnhofstrasse 45, 8021 Zurich, Switzerland

More information

IT2305 Database Systems I (Compulsory)

IT2305 Database Systems I (Compulsory) Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this

More information

A terminology model approach for defining and managing statistical metadata

A terminology model approach for defining and managing statistical metadata A terminology model approach for defining and managing statistical metadata Comments to : R. Karge (49) 30-6576 2791 mail reinhard.karge@run-software.com Content 1 Introduction... 4 2 Knowledge presentation...

More information

Simple, Robust Software RAID for Linux 2.6

Simple, Robust Software RAID for Linux 2.6 Simple, Robust Software RAID for Linux 2.6 Daniel Phillips 21st June 2003 Abstract Linux's new Device Mapper subsystem provides ecient facilities for concatenating, striping and mirroring physical volumes

More information

The Sleepy Cat Database and Semantic Data Models in Java

The Sleepy Cat Database and Semantic Data Models in Java WEB DATABASE (WDB): A JAVA SEMANTIC DATABASE by Bo Li A thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science In Computer Sciences: Turing Scholars Honors University

More information

PS engine. Execution

PS engine. Execution A Model-Based Approach to the Verication of Program Supervision Systems Mar Marcos 1 y, Sabine Moisan z and Angel P. del Pobil y y Universitat Jaume I, Dept. of Computer Science Campus de Penyeta Roja,

More information