The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron and Sandeep Purao. Databases in Organizations For many organizations, transaction processing and warehousing is crucial to the survival of the business. Most information crucial to an organization s operations is stored in databases. Databases are able to store huge amounts of information that can be conveniently manipulated and searched. The operations of many organizations come to a halt today when corporate databases go off-line. The critical nature of databases today makes database management and maintenance a vital function for most organizations. Types of Databases Analytic Databases Analytic databases, also known as On Line Analytical Processing (OLAP) databases, are primarily historical archives used for analysis. The system should be able to make such analyses easy to perform by all types of users. Security and performance are major considerations with analytic databases. Data in analytic databases is typically static and is often read-only. Most analytic databases are multidimensional and allow users to view the data in several different ways and dimensions. Data warehouses are the most common form of analytic database. Operational Databases Operational databases, also know as On Line Transaction Processing (OLTP) databases, are used to manage dynamic data typically associated with the operation of a business. These types of databases typically allow users to add, change or delete data. Operational databases are at the core of most organizations today, and they are responsible for all the data and transactions that are processed daily. Operational databases feed the data warehouses where later that data can be queried and analyzed. Database Models Hierarchical and Network Databases Databases are often differentiated according to their function. The structure of a database is also a differentiating feature. Data models typically describe the container for data and how the data is retrieved and stored in the container. The network and the hierarchical database models were the most prevalent prior to the introduction of the relational database model in the 1980s. Copyright 2010 Brian H. Cameron 1
With hierarchical databases, data is stored in tree-like structures. There are several parents and child tables. The tables are formed top down, where child tables are dependent on a single parent or upper table. The graphic above shows a hierarchical system, where the root table is a parent with two children and those children have two children each. The network database model is similar to the hierarchical model but without a top-down approach. There are still child tables that are dependent upon a parent table. However the child tables can have several parent tables. This forms a matrix design instead of a vertical layout. The network database model solves some of the problems inherent in the hierarchical model, such as data redundancy, by using sets to represent relationships rather than hierarchy. An example of a network database model. Copyright 2010 Brian H. Cameron 2
Relational Databases The relational database model was developed by Dr. E. F. Codd at IBM in the late 1960s. Dr. Codd developed the relational database model in response to problems associated with the existing database models. The concept of a relation (or table) where all data is stored is the heart of the relational model. A table consists of horizontal rows called tuples (or records) and vertical columns called attributes (or columns). Relational databases are currently the most commonly used types of databases. The relational model relates tables by actions or relationships in order to connect them. Tables are connected by keys (Primary & Foreign). The primary key is a unique tuple descriptor made up of a single or multiple attributes. A foreign key is when that attribute is used as a primary key in another table. Essentially the foreign key is used to reduce data redundancy. A good relational database has almost zero redundancy, and all tables are connected by relationships to other tables. The figure above shows all the tables connected through a series of relationships to other tables by primary and foreign keys. Object Oriented Databases Object Oriented (OO) Databases store information in the form of objects. Essentially OO databases are both an object oriented programming language and a database management system (DBMS). Objected oriented database management systems (ODBMS) are no longer thought of as replacing relational database management systems (RDBMS), but as complementing them. ODBMS make it easier to integrate with object-oriented programming languages such as Java and.net. Copyright 2010 Brian H. Cameron 3
XML Databases XML databases allow data to be accessed, imported, and exported in XML format. There are two major types of XML databases: XML-enabled databases map XML to traditional databases and take XML input and produce XML output. Native XML databases use XML for the internal model of the database, and XML documents are the unit of data storage. Current Trends in Enterprise Database Applications Data Warehouses & Knowledge Management A data warehouse is a repository for stored information. Data mining is used to extract this information for analysis and reporting. Some of the advantages of data warehousing are that complex queries can be run easily and efficiently on large sets of data. Knowledge management (KM) is a new emerging business practice for handling knowledge creation, codification, sharing, and innovation that often involves the use of a data warehouse. KM encompasses both the technical and organizational aspects of business. Key components of KM include: Generating new knowledge assets Accessing the value of knowledge assets Making knowledge assets understandable and useable Using knowledge assets for decision making Facilitating the growth and transfer of knowledge assets throughout the organization Master Data Management Often, knowledge management, data mining, and other applications need a consolidated view of data in order to perform their functions. Master data management (MDM) takes data from across the enterprise that may reside in many systems and creates a single view of the business entities and data in the organization. Though the data needed for this process varies, examples common to most organizations include suppliers, products, employees, finances, and customers. Often MDM starts with understanding and handling customer data because of the potential impact on marketing and sales. Customer Data Integration (CDI) is a relatively new term for MDM. Copyright 2010 Brian H. Cameron 4
Online Analytical Processing Online analytical processing (OLAP) applications allow users to easily extract data from several points-of-view. For example, users can extract data to be analyzed showing all of a company's products sold in a particular area, in a given time period. This information can then be compared with sales figures in another time period or sales figures for another geographic area. OLAP applications typically store data in a multidimensional database. A relational database is typically considered a twodimensional database consisting of rows and columns. Each data attribute in a multidimensional database is considered a separate dimension (such as geographic area, time period, and products). OLAP applications display the desired intersection of these dimensions, such as all products sold in an area, during a particular time period, in a certain price range. Integration Opportunities and Problems Data Formats Enterprise application integration (EAI) systems typically utilize a common data format to avoid having every adapter convert data to and from the data formats of every other application. EAI systems also provide data transformation capabilities that convert between common and application-specific data formats. Data transformation is typically performed in two steps. First, data from the application's format is converted to the common data format. Next, semantic transformations may be performed on the data. Examples include merging or splitting data objects from one application into data objects needed in another application, such as converting zip codes to city names, etc. Semantics is a term used to describe something that provides meaning. Information is data with meaning or context. Without context or meaning, data is just a collection of bytes. Data consistency is the resolution across systems of the semantics (or meaning) of the data. Data in one system are represented by formats and labels that are relevant to that particular system. For this data to be used by other systems, some type of data correlation is typically required. Beginning with the metadata, a model of an organization's information can be constructed. This model may contain information on the rules and relationships that represent the data semantics as well as the interactions with other processes, systems, and data. Metadata Metadata is essentially data about data. Metadata is information about documents, music files, photos and other forms of data. It is used to find information faster. Metadata is a key factor in the future of the semantic web (Web 2.0). It enables data to be related and easily integrated. Metadata can be generated automatically; however, human intervention allows for more precision. Some characteristics of metadata include: Copyright 2010 Brian H. Cameron 5
Integrated Integrating all of the disparate data sources and deriving meaningful, relevant information is the greatest challenge in building data warehouses. The same challenges exist for metadata repositories. These repositories need to integrate different sources and types of metadata produce accessible, relevant, and meaningful metadata. Scalable - A metadata repository that is not built to expand substantially over time will soon become obsolete. The growth of decision support systems and the increased used of knowledge management systems are factors are driving the current proliferation of metadata repositories. Robust - Metadata repositories must be able to support technical and business users needs, and they should meet the performance and functionality expectations of the organization. Customizable - Metadata should be able to be customized into any type of relationship that the user wishes to create. This allows for users to establish their own relationships in order to easily integrate different types of media. Open - The metadata technology selected should not be tied to proprietary technology or standards. For example, the metadata technology architecture should allow the easy switch from one relational database to another. Metadata and XML XML performs an important role in the description and processing of metadata, and several XML-based standards have been developed that relate to metadata. The Normalized Metadata Format (NMF) describes XML Schema standards for representing metadata. The open standard is designed to provide a simple and flexible means for the definition and interchange of metadata using XML. NMF also provides an easy way to map to relational databases. The XML Metadata Interchange (XMI) specification is an OMG standard for exchanging metadata information using XML. Meta Object Facility (MOF) is an Object Management Group (OMG) standard for metadata management and distributed repositories. There are also other emerging XML standards for the representation of metadata. One of the most recent of these standards is the Web Ontology Language (OWL). OWL was designed to provide a standard means to allow for the processing of web content, rather than just displaying content. The standard was designed for computer-tocomputer processing. OWL is built upon the Resource Description Framework (RDF), an XML-based framework used to describe resources on the web. RDF offers a syntax and data model so that independent entities can exchange and utilize it. Copyright 2010 Brian H. Cameron 6