Reusing Meta-Base to Improve Information Quality

Reusable Conceptual Models as a Support for the Higher Information Quality 7DWMDQD :HO]HU %UXQR 6WLJOLF,YDQ 5R]PDQ 0DUMDQ 'UXåRYHF University of Maribor Maribor, Slovenia ABSTRACT Today we are faced with an increasing demand for more and more complex database applications. The rapid growth has stimulated the need for a high level concepts, tools and techniques for a database design, development and retrieval with a final goal: better information quality. One of the new possibilities is using a meta data model repository with reusable components (MetaBase). Even more, just as important as reusability is in such a concept the quality dimension of reusable components which enables more expedient and efficient design of the high quality data models. As a consequence, data quality as well as information quality of any information system improves. In the paper the influence of reusability on the information quality, quality dimensions of the meta data model repository and influence of TQM as well as some of Deming s fourteen points will be presented and discussed in more details. 1 INTRODUCTION Over the years many of approaches to improve information quality have been developed and employed in various situations. It is important to recognize that most of these approaches are only vaguely aware that the prerequisite for the information quality is data quality, despite the fact that in the information technology aggressive steps to improve just the data quality are being taken.

In the last few years the information technology has been spectacularly successful in automating many operations and making data available to more people. The advances of information technology also have had an impact on poor data quality. But unfortunately just as it is natural to assume that computerized data are correct, so too it is natural to blame the information technology when the data are incorrect. These problems can grow out of proportions especially in the data warehouse environments as well as on the Internet. Further, data (information) users usually use data in their applications without giving the conceptual view much thought. But in opposite, a high quality conceptual view is of a great importance to avoid the above mentioned problems and to improve the data as well as the information quality. Conceptual modelling as a process of producing a conceptual view is a complex and a hard job. A conceptual model, as a result of this process, is used only once, i.e. within a single project. This seems to be not very practical, because a model (parts of a model) derived for one enterprise could also be used in a similar projects in the future for more or less similar enterprises. This argument results in the meta data model repository MetaBase, which enables the use of models or submodels of previous projects in the actual design. One of the most important characteristics of the Meta Base is high quality level of the saved models. Components (models and submodels) saved in the repository are of at least basic quality. This is confirmed by D.C. Rine (Rine, 1997) while through object technology (technology of the MetaBase) as well as reuse methodology the high quality tested models (software) are already developed (from the 15 quality characteristics the basic quality of the MetaBase components fulfil already 10). But for an additional increase of the data quality the information (data) should be managed too, just as products are managed. Total Data Quality Management (TDQM) and Deming s Fourteen Points are used for this purpose and result in high-quality information (data) products. The MetaBase also already fulfils some of these points. After a short presentation of terms data and information quality in chapter 2, an overview of reusability in conceptual modelling with a short presentation of MetaBase is described in chapter 3. The quality of reusable components and an influence of Deming s Fourteen Points as well as Total Data Quality Management on the higher information quality in a frame of the MetaBase are the main goal of chapter 4. We finally conclude with a summary of the proposed concepts and future research. 2 DATA AND INFORMATION QUALITY The quality paradigm is difficult to describe because of its amorphous nature. Therefore different authors tend to emphasize different aspects (Fox, 1997). When the quality paradigm was formed emphasis was given only to an inspection to achieve quality - conformance to a standard or a specification. Rapid changes in the last years have lead to new definitions of quality. One of the most important is the IEEE standard definition (IEEE, 1998) in which the quality is defined as the totality of features and characteristics of a product or service that bears on its ability to satisfy given needs. Between the above mentioned different aspects of quality are for our further

discussion most important data quality and information quality, again presented by different authors. Thomas C. Redman defines in Data Quality for Information Age (Redman, 1996) the data quality in its broadest sense. This implies to data that are relevant to their intended uses and are of sufficient detail and quality with a high degree of accuracy and completeness, consistent with other sources and presented in appropriate ways. Giri Kumar Tayi and Donald P. Bollou as guest editors of Examing Data Quality in the Communications of the ACM, have defined the term data quality as fitness for use which implies that the concept of data quality is relative (Tayi, 1998). Data appropriate for one use may not possess sufficient quality for another use. Or opposite, already used data comply to some kind of quality. A related problem with multiple users of data is also that of semantics. The data designer and /or gatherer as well as initial user may fully agree with same definitions regarding the meaning of the various data items, but probably this will not be a view of the other users. Such problems are becoming increasingly critical as organizations implement data warehouses. To the same time the conceptual view on data is becoming more and more important as a possible solution for the mentioned problems. The data quality of Ken Orr (Orr, 1998) introduces a kind of measurement view on this term. It is defined as a measure of the agreement between the data views presented by an information system and that same data in the real world. Of course no serious information system has data quality of 100%, but tries to ensure that the data is accurate enough, timely and consistent for the enterprise to survive and make reasonable decisions. Actually the real problem with data quality is change. Data in any IS database are static, but in the real world they are changing. One reason more to have a conceptual view. If defining and understanding data and data quality is difficult and different from source to source, then defining and understanding information is a hornet s nest. In some environments the term information refers to both data and information (Strong, 1997). Data usually refer to information at their early stages of processing and information to the product at a later stage. Rather than switching between the terms the information is used to refer to data or information values at any point in the process. But still we must bear in our minds that different information definitions depend upon different points of view. For example: The information management point of view Information is processed data (Redman, 1996). The information theory point of view Information is the non-redundant part of a message (Redman, 1996). The information technology for management point of view Information is data that have been organized so that they have a meaning to the user (Turban, 1996) However once a point of view is fixed, no conflict should arise and once again it is important to recognize that the prerequisite for the information quality is a data quality.

3 REUSABILITY IN CONCEPTUAL MODELING Database is the main building block of any modern information system. As an essential component, the database must be designed very carefully to contain all the information required by the user. To achieve greater efficiency the database is designed in a progressive process decomposed into conceptual, logical and physical design. The aim of the conceptual design, which results in a conceptual model, is to describe the information content of the database on an abstract and general way, independent from a particular logical model or DBMS. For that reasons the conceptual model is an essential, important component that is in principle used only once, i.e. within a single database design. This seems to be at least very time consuming, while the design process for each of different databases requires a lot of time. Numberless conceptual models were once designed and never used again, not even parts of them (Welzer 1997). Why? Is each database really so unique that no common and hence reusable data components of conceptual models are to be found? In software engineering reusable components have been well known and practiced almost since the first programs were written. Reuse has been considered as a way for overcoming the software crisis. It was never restricted only to the programs (Krueger 1992) and was also applied to the results of different phases of development as well as to human resources. However, reuse means to use previously acquired concepts and objects in a new situation. Reuse is a very simple activity that is not even listed in the most dictionaries or its explanation is very simple (Merriam-Webster s Collegiate Dictionary: Reuse means to use again especially after reclaiming or reprocessing or further repeated use). In the software environment the reuse is defined as a process of implementing or updating software systems using existing software assets (Reifer, 19979). Reuse can occur within a system, across similar systems or in widely different systems. Reusable software assets include more than just a code. Requirements, designs models, algorithms, tests, documents and many other products of the software process can be reused. Subsequently, according to the above definitions we will try to give detailed explanation of the terms database reuse and reusable database components (Welzer, 1995): Database reuse is a process of using existing database conceptual models or parts of them rather than building them from the scratch. Typically, reuse involves abstraction, selection, specialization and integration of reusable parts, although different techniques (when they will be defined) may emphasize or de-emphasize some of them. The primary motivation to reuse database components (conceptual models or parts of them) is to reduce the time and efforts required when building a conceptual model, actually a database. Because the quality of software systems is enhanced by reusing quality software artifacts, which also reduces the time and efforts required to maintain software, (similarly) reusable database components can, first of all, influence logical and physical database design and (not at last) also the database maintenance. Further, considering the above definitions we should try to answer also the following question: Is the software reuse easy and the database reuse difficult? Actually, every reuse in the field of computer science is difficult while usually useful abstractions for

large complex reusable software components or reusable database components will typically be complex (Welzer, 1995). For this reasons an appropriate way of presenting reusable concepts, either software, artifacts or conceptual models or parts of them must be found. Our suggestion is named MetaBase and presents meta models repository. 3.1 MetaBase Our approach to apply the concepts of reuse also to the database design is based on an object oriented meta data model. We have decided for the object oriented paradigm with a purpose to take advantages of its concepts (reuse, inheritance, methods) to represent our meta data model. The MetaBase (Figure 1) model is introduced as a three levels model distinguishing object level, enterprise level and function level. The enterprise level is central in the MetaBase model. It contains the conceptual models and external models (submodels) that describe a particular enterprise. This central block of the MetaBase is topped with a function level into which business and functional domains are integrated. An application domain links the function and the enterprise level. On the other hand the enterprise level is also related to a subordinated object level across objects. The object level contains the representation of semantic objects, which constitute the conceptual and external models. Enterprise level presents a connection to the functional level across application domain and enterprise. The application domain is a very important part of our structure because reuse of conceptual models (database components) is more promising if their application domains are the same or similar (Freeman, 1987) while a domain model could be a terminological framework for discussing commonalties and differences between conceptual models of an enterprise within the application domain, considering the above presented outputs. According to the MetaBase structure the conceptual models or parts of them are saved in the repository and reused in a database design. They should be reused when designing more or less similar databases. To enable this process a successful search for the appropriate components as well as modified database design process is defined FUNCTION ENTERPRISE business domain functional domain enterprise application domain object external model conceptual model method property OBJECT

(Welzer, 1997). Figure 1. MetaBase concept 4 QUALITY OF REUSABLE COMPONENTS It is obvious already from the presentation of the MetaBase repository (meta data model) that components saved in the repository are of at least basic quality (Welzer, 1998). The intention is to reach an "ideal" conceptual model. For this purpose 15 characteristics should be fulfilled: relevance, obtainability, clarity of definition, essentialness, attribute granularity, domain precision, naturalness, occurrence identifiability, homogeneity, minimum redundancy, semantic consistency, structural consistency, robustness, flexibility (Redman, 1996) and according to the user needs some characteristics of his or her own could be added. Almost all characteristics are subjective, but of a great importance for the conceptual data model and the process of modelling. Additionally we should mention that the characteristics are not independent from each other. Some of them are even in direct conflicts. To confirm the before mentioned basic qualities of the MetaBase components the features are checked against the list: Relevance - objects needed by the applications are included in conceptual models. For that the relevance is user-driven or application related. This means that a new application may require objects, which are alike objects in an existing database. The reusability of the MetaBase is additionally supported. Clarity of definition - all the terms used in the conceptual model are clearly defined. Such definitions are needed by existing and potential users. Comprehensiveness - each attribute needed should also be included. Ideally a view should be broad enough to satisfy the needs of all applications but nothing else. Occurrence identifiability - identification of the individual objects is made easy. Homogeneity, structural consistency - object level enables the minimization of unnecessary attributes. Minimum redundancy - only models checked are included. Semantic consistency - models are clear and organized according to the application domains. Robustness, flexibility - through the reuse both characteristics are fulfilled. They work together to increase the useful life time of components. Robustness refers to the ability of the view to accommodate changes in the enterprise without changing the basic structure of the component. Flexibility refers to the capacity to change components to accommodate new demands.

But for an additional increase of the data quality the information (data) should be managed too, just as products are managed. Total Data Quality Management (TDQM) is used for this purpose and results in a high-quality information (data) products in order to satisfy internal and external customers (Turban, 1996). Namely TQM is a focused management philosophy for providing the leadership training and motivation to continuously improve an organization s management and product (information) oriented process. Further TQM is a seven-step process (Turban, 1996) which has some similarities to the MetaBases approach: Step 1 - Establish the management and cultural environment. With MetaBaserepository the environment is established. Step 2 Define the mission for each component. Roles of reusable MetaBase components are well known. Step 6, Step 7 Evaluate performance, review and repeat. MetaBase ensures high quality components for building new conceptual models. An additional influence on the methodology have Deming s Fourteen Points for quality management adapted for information (data) (Redman, 1996). Some of these points are already fulfilled by MetaBase: Point 1 Recognize the importance of data and information to the enterprise. MetaBase components support the recognition and already enable some solutions. Point 2 Adopt new philosophy. The enterprise can no longer live with currently accepted levels of information (data) quality. MetaBase enables at least the basic quality of its components. Point 6 Institute job training. Through the existing models individuals and organization teach how to solve similar problems. Point 9 Break down barriers between organizations. Application, functional and business domain ensure a free flow of high quality models across the organizational boundaries. Point 11 Eliminate production quotas and management by objective. Reusable models learn how to manage and improve processes that create and use data and information. Point 12 Remove barriers standing between data products and their rights to pride in their work. MetaBase motivates designers to quickly develop new data models of a high quality. Point 13 Institute training on data and information, their roles in the enterprise and how they may be improved. Reusable models and submodels support the training. Point 14 Create a structure in top management that recognizes the importance of data and information and their relationships to the rest of the business. Business domain supports this recognition and the top management can always find a support for understanding data in existing models.

5 CONCLUSION Conceptual models of different databases are neither right nor wrong, they are more or less useful. Similarly, reusable database components from the MetaBase repository are more or less useful and present a starting point for conceptual database design supported by database reuse and of course according to the MetaBase view they are more or less reusable. Even just as important as reusability or may be even more is the quality dimension of reusable components. According to the 15 quality characteristics the basic quality of the MetaBase components fulfil already 10. But the final goal on the way to the information quality are not the 15 characteristics, but Deming's fourteen points focused on data and information (Redman, 1996). Now, they are already fulfilling 7 points. In further research on the meta data models repository they should be considered in a way to assure better quality of conceptual models and reusable components. A great support in this way is given also by TQM and its 7 steps from which 4 of them have a lot in common with MetaBase research. Further research is planed on problems of TQM and the remaining Deming s points in connection with meta data model repository MetaBase. REFERENCES Reifer, D.J. (1997). Practical Software Reuse. Toronto: John Wiley & Sons. Ermer, S. D. (1991). An analitical analysis of Pre-Control. ASQC QUALITY CONGRESS TRANSACTIONS MILWAUKEE, pp. 522-527. Grant, E.L. and Leavenworth, R.S. (1996). Statistical quality control. New York: McGraw-Hill. Ledolter, J. and Swersey, A. (1997). An Evaluation of Pre-Control. Journal of Quality Technology, vol. 29, no. 2, pp. 163-171. Mackertich, A. N. (1990). Precontrol vs. control charting: a critical comparison. Quality Engineering, vol. 2, no. 3, pp. 253-260. Masig, W. (1994). Handbuch Qualitatsmanagement. Munchen, Wien: Carl Hanser Verlag. Montgomery, D.C. (1996). Introduction to Statistical Quality Control. New York: John Wiley & Sons. Pfeifer, T. (1996). Praxishandbuch Qualitatsmanagement. Munchen, Wien: Carl Hanser Verlag. Reinhart, G.; Lindemann, U. and Heinzl, J. (1996). Qualitatsmanagement. Berlin Heidelberg: Springer-Verlag. Salvia, A. A. (1988). AOQ and AOQL for Stoplight Acceptance Sampling. Journal of Quality Technology, vol. 20, no. 3, pp. 157-161. Shainin, D. and Shainin, P. (1989). Pre-Control versus x&r charting: continuous or immediate quality improvement? Quality Engineering, vol. 1, no. 4, pp.

419-429. Steiner, H. S. (1997). Pre-Control and some simple alternatives. Quality Engineering, vol. 10, no. 1, pp. 65-74.