W H I T E P A P E R Aperture VISTA and the CMDB: An Enterprise Best Practices Approach Seth Rachlin and John Kneiling TechPar Group
2 INTRODUCTION The last few years have seen significant interest and effort in integrating the many tools large enterprises use to manage their IT infrastructure and applications. Central to this integration effort is the concept of a Configuration Management Database (CMDB), a single "source of record" that provides a logical model for and detailed information about all of the elements (the Configuration Items or CI's) in the infrastructure. At its core the CMDB seeks to redress the information fragmentation characterized by a distributed environment where siloed tools, processes and functional areas manage enterprise assets such as networks, servers, applications, and data centers. It promises a more complete, holistic, and accurate information source and reference point to be leveraged by the entire IT organization as they plan, build and manage IT services. Much of the momentum behind the CMDB is driven by the significant value it can deliver across these key functions. For the planner, it provides a more accurate and complete picture of the current environment as well as its associated costs, enabling more comprehensive "what if" analyses and scenario evaluation. For the builder of new services, it offers a more robust way of assessing the impact of the new service on the current environment from both a change management and capacity planning perspective. And for those who manage IT services, it provides an invaluable resource in diagnosing the root cause of incidents and problems and in speeding their resolution. Better information, as promised by the CMDB, is foundational to process improvement efforts in areas as seemingly diverse as Service Portfolio Management, Capacity Management, Change Management and Problem Management. And it is increasingly the place many enterprises are choosing to start as they move to tackle these key initiatives. Despite its name, the CMDB - as it is typically implemented - is not a database in the traditional sense. Best practices strongly advise against a classical "data warehousing" approach in which information is extracted from operational data sources and consolidated in a physical database for the purpose of aggregation, interrogation, and analysis. Rather they suggest what is called a "federated" approach where a central database holds a set of common reference information about a particular element or CI and links to other repositories where other domain-specific information may live. In this approach, the central database - which is typically built up from the foundation of an asset management system - might be the authoritative source for a subset of information regarding a server (Make, Model, Serial Number, Asset Name and Number, Business Unit, etc.). Information about the configuration of this server would reside in a server configuration management tool; information about services deployed on the server in an application mapping tool; information about its physical location and impact on the power grid in a data center management tool. These tools and the information they manage are connected through links - shared database keys and indexes -- such that a query or set of queries can span the multiple repositories to produce a result set or visualization that looks as if it came from a single data source. The CMDB is a logical construct that gets built, not a database you buy and then populate. Successful CMDB projects, in this context, are about progressively iterating toward greater levels of information integration, consistency, and coverage, not about installing and loading a particular package or system. Successful implementation of the federated CMDB will require the vendors of the applications and tools which comprise it to provide open, standards-based interfaces to support this type of query-on-demand capability. Customers should be wary of solutions which provide only limited or proprietary interfaces as they are unlikely to meet the demands of the project particularly as it progresses.
3 The Schema below provides a logical data model view of such a Federated CMDB. In this simplified view four infrastructure tools (A Server Configuration Management System; An Asset Management System; A Data Center Management System; and An Application Mapping System) share four common attributes (Make; Model; Serial Number and Host Name) that enable linkages between all the information relating to this specific CI. The Asset Management System serves as the core repository maintaining a central CI ID as well as a CI Type which can enable rule-based queries to identify the location of relevant information: for example, for a CI type of Server, we would expect corresponding information in the Server Configuration Management System; for a Type of Router, in the Network Configuration Management System, and so forth.
4 Though federation may seem at first more complex than a single database, its advantages are significant. Federation allows the day-to-day management of information regarding the environment to remain within the tools most suited to the CI's in question, and the organization's investment in these best of breed tools is preserved. Federation also acknowledges that, given the rate of change in typical IT environments, the task of synchronizing all configuration information between a set of management tools and a central database would be inordinately costly and difficult to implement. CMDB information is only useful if it is accurate and consistent. The best way to insure this accuracy is to store the most volatile information at its source. KEY ASPECTS The above should not be taken to mean that creating a CMDB from a set of disconnected information repositories is easy. Federated CMDB's by their very nature have to evolve as disparate and previously unconnected systems and tools are brought into the framework. That said, it is typically very dangerous to approach a CMDB project without an architecture that depicts, at least at a high level, what the end-state will look like. This architecture needs to take into account three key aspects: 1. Reconciliation: In all but green-field environments, configuration information, prior to the CMDB, has been managed by disparate and unconnected systems. It is almost a certainty that some information (a server's Hostname for example) is maintained in multiple repositories and that in some instances this information is inconsistent across these repositories. Organizations looking to create the linkages that make the federated CMDB possible must reconcile information inconsistencies or lacunae between disparate systems such that they contain enough common information to make the links effective. An asset management system may track a server based on its serial number while another system may track it based on host name, for example. This initial upfront effort - within the context of a large enterprise - may be significant and is an important part of any CMDB project. 2. Synchronization: Once information is reconciled, it needs to be kept consistent on an ongoing basis. Enterprises undertaking CMDB projects need to build synchronization processes to keep the information consistent across the relevant systems and tools that share it. If the server's host name changes, this change needs to be reflected across all connected repositories. This process can be challenging. It is likely that there are significant redundancies in the information collected and maintained by the various infrastructure management tools. When such redundancies occur, organizations need to identify and agree on an "authoritative" source for every CI attribute within the CMDB and ensure that all of their management tools rely on this source rather than a local and potentially inaccurate copy. This will likely involve changing the behavior
5 of particular tools as well as the processes in which they are used: it is unlikely that the promise of a CMDB can be delivered without some level of process and information reengineering. It also places a burden of the tools themselves. Customers using toolsets that are incapable of being addressed through standard interfacing methods, that can not establish links to federated data and that can not make updates to or be updated with core information will have difficulty with this critical aspect of their CMDB project. 3. Visualization: Reconciliation and Synchronization are critical to insure the accuracy and completeness of the information within the CMDB. Successfully leveraging this information requires efforts in the area of visualization. CMDB information needs to be made available to existing infrastructure tools so that the dashboards, monitors, process managers and reporting engines access and rely upon CMDB data. The information brought together within the CMDB will also suggest new views and ways of navigating data that go beyond existing capabilities. Development efforts to integrate existing views and provide new ones need to be part of the CMDB effort. In approaching this set of tasks, it is important to take into account the multidimensional quality of CMDB data. The view of a server is radically different depending on whether the viewer is planning a server consolidation project; deploying a new application on it; or troubleshooting an outage. Multiple views expressed in multiple tools will be required for the CMDB to fulfill its potential. CMDB data - wherever possible - should be exposed in the tools people use to do their job and not in an ancillary reporting or visualization solution. Enterprises should be wary of "single bullet" solutions that promise a visualization solution that is disconnected from operational systems. As should be clear from the above discussion, a holistic data and systems architecture is a critical prerequisite for any successful CMDB project. The section that follows provides some thoughts as to the key "building blocks" of that architecture. Domains BUILDING BLOCKS Domains As in any data integration project, proper definition of scope is a critical first step to creating a successful CMDB. Most larger organizations will likely want to take a phased approach, with phases scoped to deliver value within acceptable budget and timing parameters. There are many acceptable ways to phase a CMDB project. Organizations can choose to focus on systems, assets and services that serve a particular business unit; that reside in a particular location; or that are part of production environments as distinct from development or test. Equally common, however, is the decision to phase CMDB development according to what are referred to as domains. Like any database, the CMDB is based on a model of the real world, or that part of it that is interesting to the beholder. In the Configuration Management world, each CI belongs to a different area, such as Software or Device. Each of these areas, or domains, has responsibility for its data. Although the domains are autonomous, they share data by participating in a federated database. There are a number of domains in a CMDB. In our first example we will define ten - Location, Device, Software, Application, Data Store, People, Document, Changes, and Incident. These domains are federated building blocks that establish the scope of the CMDB. The CIs in these domains have a number of relationships that we can predict, and that are designed into the CMDB. For instance, Devices are in a Location. A special kind of domain, Collection, can be used to group together arbitrary CIs across domains for project management, security, or other purposes. A Location CI is a physical location where People and Devices reside; it could also contain racks (which hold devices), or other CIs. The Location could be a building, the floor of a building, or even a room, and might be linked to Document CIs describing the location. A Device CI is a piece of computer hardware, such as a PC, printer, router, switch, or server. The device is assigned to a location, and might reside in a rack. The Device itself could contain Applications, Software Products and Data Stores. The Device would be supported or otherwise associated with People and Documents. Although the words "software" and "application" are often used interchangeably, in the CMDB these two domains have distinct characteristics. A Software Product CI is a commercial product that doesn't directly perform a business function, or provide a business service, but does support an application. Common Software Products include operating systems, development tools, database software, middleware, and web application servers. These Software Products reside on Devices, and may use Data Stores. Like most CIs, they can also be associated with People and Documents. An
6 Application CI is a software component that supports a business process. Applications usually rely on Software Products and Data Stores, and are associated with People and Documents, and reside on Devices. A Data Store CI is an information repository or database. Data Stores are often confused with Software Products - for example: "Oracle" and the "Personnel" Database. "Oracle" is a Software Product which manages the "Personnel" Data Store. Applications are supported by Software Products, which rely on a Data Store. A Data Store resides on one or more Devices, and usually supports a single Software Product. Data Stores are associated with People and Documents. A People CI is a Person involved in an IT operation, or a Person who can read or write to the CMDB. Types of people might include "Contractor" or "Administrator". A Document is a file containing information relevant to another CI, such as a user guide for an Application or a build document for a Device. Documents can be linked any number of CIs, forming an information repository (but this is not a Data Store). A Change is a modification of the configuration, and is associated with one or more CIs. Changes can be linked to other Changes, to change approvers, support groups, or other People, and to Incidents. An Incident records a fault within a CI, and could be linked to all CIs affected by the incidents. A single Incident set of Incidents could be flagged as a problem, and can be assigned to support groups. An Incident often results in a new Change. Sometimes it helps to group arbitrary CIs together. For example, we might need to associate all of the Locations, Devices, Software Products, and People in a project. They can also be used to define support groups, used by Changes and Incidents. Security administrators can use Locations to restrict access to specific CIs. Servers can be grouped together in load balancing clusters, virtual LANs, or for any other purpose that benefits configuration management. PROVIDERS AND CONSUMERS The process of creating a CMDB requires the integration of disparate systems which have typically evolved independently to meet the needs of particular groups of users working in traditionally distinct domains. Informal processes of information integration - typically conference calls, e-mail, and Project Management Offices - need to be replaced by formal and defined interactions among information systems. A critical step in this process is to document the desired flow of information between systems in the form of context diagrams that reflect both the "as is" and "to be" states of the tools and technologies that together manage the critical configuration information required to run the infrastructure. A context diagram is a "data flow diagram showing data flows between a generalized application within the domain and the other entities and abstractions with which it communicates." (www.sei.cmu.edu/domainengineering/context_diag.html) If the CMDB is viewed as the generalized application, then the context diagram that is produced documents the flows of information to and from the CMDB to other systems and applications in the environment. Every tool and repository used to manage the infrastructure can, in this way, be viewed as either an information producer or an information consumer with respect to particular domains or sub domains of configuration information. The diagram below presents a prototypical context diagram adapted from an actual CMDB deployment. Note how the domains in this CMDB vary slightly from those in our first example to accommodate the data center's unique set of requirements:
7 Establishing an appropriate context diagram is critical to the success of a CMDB project because it provides an authoritative reference to the systems that actually provide and consume data. The users of the CMDB read and write data using applications called Providers and Consumers. An Information Provider creates CIs in the CMDB. This application could be a configuration modeling tool that allows a user to create a visual picture of the data center itself - either from scratch, or while modifying the configuration. Information might also be provided by change management applications, or even programs running from within the Configuration Items themselves. An application that scans the network might find a Device and as a result create a CI in the CMDB. Information Consumers view and update asset management software. Asset management software views and modifies CI instances. A server (which itself is recorded in the CMDB) can report a change in status directly to a Device or Software Product table. Sometimes changes propagate from one system to another until they reach the CMDB. An electrical contractor (Person) assigned to maintaining a Data Center (Location) might report a change in telephone number to management, which results in an update to the Enterprise's directory. A program associated with People can be designed to read the directory, detect changes, and propagate them to the relevant People table. BASE AND EXTENDED ATTRIBUTES In a federated database, such as the CMDB, individual pieces of data may be viewed by many, but each is owned by a different part of the infrastructure. For instance, data about Devices and People might be controlled by different software packages, each using its own distinct and autonomous database. Sometimes it is important to place some of the data in a common space to allow all Providers and Consumers to access it easily. The attributes describing general information about the function of an Application might be made available to all in a centralized database. These are called Base data, or Base Configuration Attributes, because they supply basic information about a CI. One drawback to centralizing this federated data is that the centralized data is just a copy of the data owned by the domain-specific provider. This redundancy introduces the risk that the centralized and provider data will differ as a result of an application error, software failure, recovery/restart problem, or other incident. To avoid these inconsistencies, the CMDB should be encapsulated with synchronization processes that protect an asset starting at the time of its birth. Because some errors are unavoidable, a reconciliation process can look at the both copies, and see if there are any problems. If they are inconsistent, it is important to determine which copy is authoritative, and therefore "correct." This is usually the domain-specific data.
8 Often, additional detailed or specialized data is stored only by its owner, and is not copied to the centralized database. Although an Application's general information might be made available as Base data, the specific configuration for a particular type of server, such as a load balancing algorithm, might be stored by a specialized application (perhaps within the load balancer's management software itself). This specialized data is referred to as Extended Attributes, and is managed solely by the provider of that domain's data. This does not mean that the data is unavailable outside the domain. In addition to Base data, the centralized part of the CMDB contains reference links to the Extended Attributes, and information on how to access them. This concept of reference links as it relates to extended attributes is fundamental to the concept of federation, as explored earlier. Dividing CMDB data into these two categories provides a great deal of flexibility, and allows us to optimize the CMDB to our advantage. The Base data often forms the working set - the proverbial 20% that is used 80% of the time. By limiting the data in the centralized part of the CMDB, it becomes highly efficient, highly available, and because only the working set is accessed, highly responsive. Extended Attributes have a number of development and operational benefits. If the centralized CMDB has already been designed, and a new piece of CI-related management software is introduced, it may introduce additional attributes. If these attributes remain Extended, there is no immediate impact on the CMDB. The options at this point would be to do nothing to the existing design, to add a link pointing to the new Extended Attribute, or to copy it to the centralized design as a Base Attribute. In fact, all three could be done in series if needed, to accommodate incremental change. Extended Attributes often require specialized treatment, and benefit from being stored in specific DBMS products or structures. Transactional data can be stored in database optimized for high volumes. Applications that only need Extended Attributes will not need to access the Base CMDB, which will improve its performance and will, with proper design, eliminate it as a bottleneck. The implementation of Base and Extended Attributes exploits the benefits of federated distributed database performance. RELATIONSHIPS One of the benefits of storing configuration data in a database is that we can create and discover relationships between configuration items. In fact, a great deal of the information in a CMDB is contained not in the CI tables themselves, but in the relationships between CIs. For instance, if we know that the CI AppA "depends on" ("depends on" is a relationship) CI ServerX, we also know that if ServerX fails, so will AppA. If ServerX "depends on" PowerStripY the implications for AppA are clear unless AppA is "managed" by LoadBalancerZ and can "fall back" to ServerB, etc. These relationships can become more complex with multiple dependencies. Servers, for instance, often use two power strips, which in turn should be connected to two different breakers. But what if they use the same breaker? How can we discover this? One of the most useful views of CMDB data we can get is that of dependencies, either Upstream (what a CI depends on), and Downstream (what depends on this CI). A similar view helps us determine the composition of a complex CI. A Rack may contain many CIs - servers, power strips, blades, etc. A Decomposition view tells us which CIs are contained within the Rack CI. Conversely, a Rollup can tell us which Rack holds a given CI, such as a Blade. These views help us plan for capacity, performance, change management, and in general gives us all of the power of data mining and analysis available to modern database systems and business intelligence tools. This section, while laying out a framework for a federated CMDB architecture also suggests some key strategies for properly phasing CMDB projects. Recognizing that building a CMDB is hard work and is best done over a series of progressive phases, enterprises should: 1. Prioritize domains, focusing on the most critical first and building outwork so that CMDB coverage grows over time. 2. Limit the number of providers and consumers that are integrated in each successive phase. Projects will succeed when the number of integration points in a particular release is manageable and when new systems are included in a phased approach.
9 3. Limit the number of extended attributes included in the scope of the CMDB to those with cross-functional importance. If the particulars of a router configuration or a server-load balancing schema are relevant only to the group administering it, it does not necessarily need to be made available through the CMDB. APERTURE VISTA IN CMDB Like other critical management tools, the Aperture VISTA data center management system plays a critical role in a properly crafted CMDB strategy. We can, based on the framework outlined above, articulate a set of principles that can serve as the foundation for any integration effort: 1. Aperture VISTA is the system of record for the Location CI domain as well as the principal information provider to other systems for this CI. The VISTA application and its underlying repository are uniquely suited to modeling, visualizing, and managing (and propagating) changes within this domain. 2. Aperture VISTA is the system of record as well as the principal information provider for extended attributes related to physical configuration (cards, slots, power supplies) as well as power and cooling requirements and consumption within the Device domain. The VISTA data model is optimized for the collection, management and visualization of these attributes as well as the principal tool used by those who make changes that impact their values. 3. Aperture VISTA is the system of record for managing the physical relationships between and among CI's within the Device and Location Domains. This includes relationships within devices (Card X is in Slot Y within Server Z) as well as between devices as they are manifest in the context of physical connectivity (power, network). These relationships are key to being able to show how object relationships are manifested physically - to achieve powerful visualizations such as seeing all the locations of servers used by a business unit on a top down floorplan. 4. Aperture VISTA is a consumer of extended attributes and relationships within the Software and Application domains. As the preferred tool of data center operations, it should be the mechanism by which its users get critical information about the Software and Application configurations of the devices they are working with. Properly integrated within a CMDB, therefore, Aperture VISTA has the following core 3 properties: 1. It shares the common set of base attributes for all CI's related to Location, Device, and Change. These attributes are synchronized between the Data Stores that are constitutive of the CMDB, such that information is consistent and shareable between them. Changes to base attributes that occur outside of VISTA must be propagated to it. VISTA must publish changes to other systems when it changes base attributes. 2. Aperture VISTA supports the dynamic querying of the VISTA repository for CI extended attributes and relationships to be presented through other Operations tools and processes. As the system of record for the physical data center configuration, information managed through VISTA must be available to other systems that need it. 3. The VISTA application supports the dynamic querying and subsequent presentation of information from other repositories within the CMDB. VISTA can and should be the source of all relevant configuration information its core group of users need to do their jobs.
10 CONCLUSION Developing and deploying a functional CMDB that delivers value across the IT organization is a significant effort which requires planning, consensus building, and architectural excellence. While there is certain truth in the vendor claims that the CMDB is a new system, this paper has argued that the CMDB is better understood as the result of the integration of existing systems and should be approached as one would any such project. For this integration to be possible, vendors of the myriad tools and technologies used to manage an ever increasingly complex IT infrastructure should be encouraged to "open" their systems to such integration at the minimum through web services API's and ideally through shared standards such as DMTF's Common Information Model. (CIM). Without such an approach, customers will be challenged to upgrade software releases of one tool or another without breaking the integrations on which their CMDB sits. This white paper was sponsored by Aperture Technologies, Inc.
About Aperture Aperture is the leading provider of enterprise software solutions that enable organizations worldwide to manage the physical infrastructure of their data centers. Aperture solutions automate and facilitate standardized best practice processes to manage the complexity and ever-changing conditions in today's enterprise data centers and to deliver world-class performance in IT operations. With Aperture, organizations worldwide have reduced operational risks, increased efficiency and generated actionable information to make better business decisions. Aperture's flagship product, Aperture VISTA, is an enterprise software solution which reduces operational risk and improves efficiency through the visual management of the data center. Aperture VISTA delivers the key processes that enable organizations to take control of an increasingly complex physical environment including equipment, space, power, cooling, network and storage. The world's largest companies use Aperture VISTA to achieve substantial improvements in the quality, reliability and cost effectiveness of their infrastructure planning, design, provisioning, troubleshooting and analysis. 2008. All rights reserved. Aperture and Aperture VISTA are registered trademarks and the Aperture logo mark is a trademark of Aperture Technologies, Inc. All other trademarks are the property of their respective owners. 05-08