Re-design an Operational Database Introduction In today s world it is seen that lot of organizations go for a complete re-design of there database. Let s have a look why do we need to technically re-design a database because of Business needs. With time the database undergoes random change and lots of new requirements come in to support the business needs. Its one of the reasons the database structure needs to be redesigned. Other factors which contribute to re-designing the database structure are performance of the database, repetition of same data structure (data redundancy and data anomalies). There are two approaches to re-design an operational database. First approach would be to collect the present requirements (the present application supports) and new requirements, completely re-design the database as done in a traditional way for a new application. Second approach would be to reverse engineer the existing database and new requirements as input and re-design the database. The topic of discussion in this article is about the second approach. Flow of Data Models Lets first look how different type of data models are defined in two different scenarios. Requirement Definitions using a Data Model is a common practice. In the first scenario, to create a new application for an organization the traditional way to define the requirements using a data model is shown in Figure 1. The requirements come from the clients and the contextual model is defined. Slowly the Conceptual, Logical and Physical Data models are evolved. In the second scenario, to re-design an operational database, reverse engineering methodology is used and the flow of data models is shown in Figure 1. Contextual Model is basically used to define the terms and definitions or scope of the terminology, we can avoid this layer for the present topic.
Contextual Model Conceptual Data Model Requirements Definition using Data Model (Traditional Way) Logical Data Model Reverse Engineering Physical Data Model Physical Database Figure 1. Operational Database Data Structures We should understand what does the Operational Database data structures consist of. 1) Master tables or reference tables 2) Transaction tables 3) Intersection tables (Associative entities in Logical terms) 4) Relationships between the tables Characteristics of a Master table or Reference table Master or reference tables are usually lookup tables (Figure 2), which are grouping or categorizing of business rules data used by a particular organization. Master table data values are used by the transaction tables for reference purpose. Some of the characteristics of a Master table: 1) Usually consist of a Code Attribute and Description Attribute (as minimum number of attributes) 2) Data change is almost static. Data changes are infrequent in nature.
3) A Master table contains few records or rows. In the example shown below (Figure 2.), two tables Employee and Job are displayed as Master tables. Figure 2. (Logical View) Characteristics of a Transaction table Transaction tables come into existence after the Master tables. Usually the transaction tables are created by intersection of master tables and with some transactions (example: some numerical value). Some of the characteristics of a Transaction table: 1) Usually contains the Master table code attribute (Primary Key) as foreign key in the table. 2) Usually contains some transactions (measures or metrics) 3) Data changes are frequent. 4) A Transaction tables contains very large number of records or rows. In the example shown below (Figure 3.), a Customer gives an ORDER for multiple products. The transaction is stored in the ORDER transaction table.
Figure 3. (Logical View) Characteristics of an Intersection Table or Associative Table Intersection table or Associative table is mainly used to resolve many-to-many relationships between entities. Some of the characteristics of an Associative table: 1) Usually contains the Primary key of two or more tables as foreign keys 2) Number of records or rows is entirely dependent on the combination of the instances of the keys. 3) May contain other columns which is a unique value for the combination In the example shown below (Figure 4.), Employee can do many jobs and a Job can be done by many Employees. So the Many-to-Many relationship between the two tables is resolved using an Associative table (Employee Job)
Figure 4. (Logical View) Characteristics of Relationships A relationship is one of the main components of a Data Model. It defines the relation between the tables. Example: One to Many (1:M) Department Project Many to Many (M:M) Employee Project One to One (1:1) Employee Workstation Figure 5. Development Approach As mentioned above we will re-design the operational database using the second approach (by reverse engineering the existing database and adding the new requirements). Before re-designing the operational database we should be aware of the professionals required and also should be aware of the advantages for re-designing the database. Professionals required for re-designing the operational database. 1) Subject matter experts or Domain experts Provides insight to the domain definitions and terms. Best judge to confirm if all the business terms are defined properly. 2) Business Analyst All the new requirements are satisfied by the Data Models. 3) Data Architects and Data Modelers Creator of different type of models as per the different users of the data models. 4) Data Source Owners Experts to confirm the real meaning of the data in the existing database.
5) Application users or professionals Experts to specify how the data is used in the application. Advantages of an existing database: 1) Already existing database structure Gives an idea about the present requirements and how the data is structured in the database. 2) Application Gives an idea about the business rules and how the data flows through a business process. 3) Sample Data Helps in data analysis and confirms the structure of the database. How to go about the Modeling Reverse Engineer the Existing Database Reverse Engineer the existing operational data using any Data Modeling tools (almost all the tools support). The output of this process is a physical data structures with the relationships (if defined). Physical Data Model (As-is) The output of the reverse engineering of the existing database is the as-is physical data model. Now try to follow the steps as mentioned below. Identify the Master tables Analysis into the tables structure and by looking into the characteristics of a master table (mentioned above), identify the master tables. Validate the findings with data source owners and application users using sample data of the existing database. Identify the Intersection tables or Associative tables Further analysis of the existing physical data model will result in Associative tables which were used to resolve the many-many relationships. Please make a note of the findings. Identify the Transaction tables More analysis of the data model and matching the characteristics of a transaction tables will result in identifying the transaction tables. Validate the findings with data source owners and application users using sample data of the existing database.
Logical Data Model (As-is) Build the Logical Data Model (As-is) from the Physical Data Model (As-is) by defining the proper Business or Logical name of the Entities (Tables) and Attributes (Columns). Take out the audit columns and data types from the database structure. Consult the Subject Matter Experts or Domain experts, Data Source Owners and Application Users to define the entities and attributes of the Logical data model and also the relationships between the entities. Validate this model with the Business Analyst (does it satisfy all the present requirements). Conceptual Data Model (As-is) Group the entities (logically) to define a Group entity (Conceptual Entities) and the relationship between these group entities. Validate this model with Stakeholders, if this model is satisfying the business concepts of the Business Process. Conceptual Data Model (To-be) Input to this Data Model is the Conceptual Model (As-is). Define the new requirements in the data model (add conceptual entities if needed). Validate the Data model with stakeholders and business analyst. Technically the model has to be approved by the Data Architect. Logical Data Model (To-be) Details of the Conceptual entities are defined and all the many-to-many relationships are resolved in this Data Model. Conceptual entities are usually broken down to logical entities and the attributes of each entity defined here. Define all the primary keys and foreign keys as desired. Resolve all the data redundancy and data anomalies existing in the existing database by creating the associative entities. Define the Logical Naming Convention for the all the Logical entities and Attributes in this layer. All the definitions of the logical entities and attributes are validated and defined for new entities and attributes by the Subject Matter experts and Business Analyst. Physical Data Model (To-be) Define the standard Physical Naming convention for the all the entities (tables) and attributes (columns). Glossary for physical names can be used in the future also. Build the Physical Data Model from the Logical Data Model. Define the data types and sizes of all the columns taking reference to the existing data types in the present database. Define the audit columns, tablespaces, indexes and bufferpools etc, by consulting the Database Administrator. Now generate the scripts and now your new database is ready to use. Manage all the Data Models (Conceptual, Logical and Physical) as mentioned in the article Managing your Models, so that there would be less time, resource and cost required to re-design a database.
Summary It is easy to re-design an operational database if a step-by-step approach is taken. It will resolve all the performance problems, avoidance of structure repetition. The approach mentioned above will ensure the existing requirements are defined and also the new requirements of the application are satisfied. References Managing your Data Models http://www.ibm.com/developerworks/architecture/library/ar-dmgov/ What is Master Data? http://www.b-eye-network.com/view/6758