Modelling and Systems Development Lecture 2 Data management layer design Mapping problem-domain objects to object-persistence formats Object-persistence formats Files Sequential and random-access Relational databases Object-oriented databases Object-relational databases Files Directly supported by programming language Example: java.io package Sequential access Optimized for operations on entire file Access to specific objects not efficient Example: java.io.inputstream class Random access Optimized for finding specific objects Entire-file operations not efficient Example: java.io.randomaccessfile Relational databases Primary key an attribute that uniquely identifies a row Foreign key an attribute that is a primary key from a different table in order to simulate associations between objects Referential integrity guarantees that the foreign-key links are valid Structured Query Language (SQL) the standard language for accessing relational database tables Relational Database Example Object-oriented databases Figure 3-3 Goes Here Primary keys are underlined. Foreign keys in are double underlined. Referential integrity is suggested through the red ovals. Two approaches Adding persistence extensions to OO languages Separate DBMS Standards from ODMG (Object Data Management Group): ODL, OML, OQL ODMG finished work in 200
Object-oriented databases Support complex data: MM CAD/CAM, GIS Finance, Healthcare, Telecom Object ID assigned Some support for inheritance Sharp learning curve Object-relational databases Relational databases extended to handle the storage of objects Use of user-defined data types Extended SQL to handle complex object data Inheritance tends to be language dependent Comparing object storage formats Mapping to object-persistence formats Storing the problem domain (PD) objects introduces conversion requirements Put functionality for storing and retrieving in data management (DM) layer DM classes dependent on PD classes, not vice versa Add primary and foreign keys Unless they add too much overhead Appointment System: PD and DM Layers PD Layer Superclass -attribute Mapping to an OODB Superclass2 Superclass -attribute DM Layer DM- DM- contains methods for storing and retrieving objects Class factoring out MI in 2 different ways Class Class Superclass2 Superclass -attribute 2
Mapping PD objects to ORDBMS. Map PD classes to DM classes and tables 2. Map attributes to columns; distinguish. elementary types 2. set types 3. class types (object ID s) 4. derived attributes DM- Class -attr: elem_type -attr2: set_type -attr3: class_type _Table these types of attributes are supported by the ORDBMS tables Mapping PD objects to ORDBMS 3. Map inheritance to associations 4. Map associations to columns (object ID s) Class_Table -Superclass_table[..] -Class2_table[..*] Superclass Class Class_Table..* Class2 Superclass_tbl Class2_Table Mapping PD objects to ORDBMS Person..* Symptom -name: String _Table -Person_Table[..] -s[] -Symptoms[..*] Symptom_Table -name[..] -s[] Also add read/write methods in DM_ and other DM classes Person_Table -name[..] _Table -date[..] Mapping PD objects to RDBMS Get rid of multivalued attributes Replace object ID s by foreign keys Person..* Symptom -name: String _Table Symp_Table -name -Symp_ID Person_Table -name _Table -date Pat_Symp_Table -Symp_ID Navigation is not preserved Building a DM layer Album * Track Very complicated to map to an RDBMS Several design patterns can be used (see link on MSO website) Many tools are now available key «table» Albums «table» Tracks key albumkey 3
Optimizing RDBMS-based object storage Two dimensions for optimization: Storage efficiency (minimizing storage space) Speed of access (minimizing time to retrieve desired information) Optimizing storage efficiency Reduce redundant data Limit null values Multiple possible interpretations can lead to mistakes A well-formed logical data model does not contain redundancy or many null values A model is in first normal form (NF) if it does not lead to multi-valued fields or repeating fields Every row has the same number of columns All fields contain precisely one value Prod_..* NF: repeating product fields in original table are now in a separate table. However, there are still partial dependencies. Customer.. Prod_ 2NF: With partial dependencies removed there only remains a transitive dependency: the Tax rate depends only on the State...* Product Customer.... State Prod_..* Product 3NF: With OO-modelling you would obtain a similar model right away! 4
Optimizing data access speed Optimizing data access speed Denormalize by adding fields: of look-up tables of tables in - relationship -Payment_type -Payment_desc Payment_type -Payment_type.. -Payment_desc Clustering put similar records close together on the hard disk Indexing more space (additional tables) more speed (especially when index is put in memory) when retrieving data less speed when updating data Payment Type Index Guidelines for creating indexes Use indexes sparingly for transaction systems Use many indexes to increase response times in decision support systems For each table Create a unique index based on the primary key Create an index based on the foreign key Create an index for fields used frequently for grouping or sorting 5