How To Clean Data In A Configuration Management Database



Similar documents
Software Audits Three Ways to Cut the Cost and Pain of a Software Audit

Next-Generation IT Asset Management: Transform IT with Data-Driven ITAM

BDNA Technopedia Meets Microsoft SCCM From Raw Data to Actionable Information for IT Decision-Making

Veritas Configuration Manager Profile. A Profile Prepared by EMA October 2006

ITIL, the CMS, and You BEST PRACTICES WHITE PAPER

Unifying IT How Dell Is Using BMC

Automating Software License Management

BMC Remedyforce Asset Management. Frequently Asked Questions

What s new in AM 9.30 Accelerating business outcomes

Datacenter Management Optimization with Microsoft System Center

How To Manage Software License Management With An Aspera Catalog

KIFINTI. Kifinti White Paper. IT Asset Management Four Steps To Implementing an Effective IT Asset Management Strategy. Paul Kelsey 4/1/2012

HP OpenView AssetCenter

Cracking the Code on Software License Management

AssetCenter 4.4. Total Asset Visibility and Control. Control Costs. Ensure Compliance. Reduce Complexity

Windows Server 2003 migration: Your three-phase action plan to reach the finish line

Taking Control of Software Licensing Reducing Cost and Mitigating Risk with Accurate, Relevant IT Insight

Why you need an Automated Asset Management Solution

Peregrine. AssetCenter. Product Documentation. Asset Tracking solution. Part No. DAC-441-EN38

License management service

Understanding Vendor Impact on IT Strategy: Dell

The Convergence of IT Operations

CONDIS. IT Service Management and CMDB

Discovery and Usage data for Software License Management

How To: Choosing the Right Catalog for Software License Management

W H I T E P A P E R M a n a g i n g t h e I T S e r v i c e Lifecycle: The HP Approach

HP Asset Manager. Software version: Asset Tracking Solution

Select the right configuration management database to establish a platform for effective service management.

Windows 7: Tips and Best Practices for Simplified Migration By Nelson Ruest and Danielle Ruest

HP Service Manager software

Randy Steinberg Migration Technologies

CONTENTS. Abstract Need for Desktop Management What should typical Desktop Management Software do? Securing Desktops...

SACM and CMDB Strategy and Roadmap. David Lowe ActionableITSM.com March 20, 2012

Hewlett Packard Enterprise connects with SharePoint Driving communication and collaboration helps HPE maximize the value of employees

Symantec Client Management Suite 8.0

HP APPLICATION PERFORMANCE MONITORING

BDNA continues growth surge as channel activities expand

Taking a Proactive Approach to Patch Management. B e s t P r a c t i c e s G u i d e

Altiris IT Management Suite 7.1 from Symantec

Consolidating Your Database Infrastructure. Tom Mills Consultant Microsoft Corporation

CMDB Federation. DMTF Standards for Federating CMDBs and other Management Data Repositories

White Paper November BMC Best Practice Process Flows for Asset Management and ITIL Configuration Management

Choosing the Right ERP Solution:

Windows Server 2003 End of Support Options

HP Client Management Solutions Overview

Changing the way companies run their data centers

Connecting the dots from automated software discovery to asset management

7 Practical insights for IT Asset Management

Software Asset Management. The challenge

IT Asset Inventory and Outsourcing: The Value of Visibility

Getting a head start in Software Asset Management

Gain IT asset visibility, control and automation

Driving a New IT Reality

Release & Deployment Management

How to make auto-discovery work; a dynamic data-driven alternative to the periodic trawl of Discovery and Dependency Mapping (DDM)

Software Smart Buying Lower cost, better insight

ITIL Asset and Configuration. Management in the Cloud

Network Configuration Management

Data Center Infrastructure Management (DCIM) Integration

The Modern Service Desk: How Advanced Integration, Process Automation, and ITIL Support Enable ITSM Solutions That Deliver Business Confidence

Legal Notices Introduction Who is this white paper intended for? Process overview... 5

Program Summary. Criterion 1: Importance to University Mission / Operations. Importance to Mission

DCIM Software and IT Service Management - Perfect Together

BRIDGE. the gaps between IT, cloud service providers, and the business. IT service management for the cloud. Business white paper

Performance Optimization Guide

ITIL V3: Making Business Services Serve the Business

IBM Tivoli Storage Productivity Center (TPC)

SEPARATING FACT FROM FICTION MICROSOFT & POWER MANAGEMENT

The business value of improved backup and recovery

HP Client Manager 6.1

NOTE: For more information on HP's offering of Windows Server 2012 R2 products, go to:

HP ProLiant Essentials Vulnerability and Patch Management Pack Planning Guide

Streamline Your Windows OS Migration with Novell Endpoint Lifecycle Management Suite

HP Client Manager 6.2

5 CMDB GOOD PRACTICES

EMA Services for IT Professionals

Software Asset Management much more than inventory

Release and Deployment Management Software

Seven Steps to Getting a Handle on Software Licensing

Managed Objects Service Configuration Management

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data

eguide: Designing a Continuous Response Architecture Executive s Guide to Windows Server 2003 End of Life

The Total Cost of Ownership (TCO) of migrating to SUSE Linux Enterprise Server for System z

HP Universal CMDB. Software Version: Data Flow Management Best Practices

Prospect 365 CRM Installation Requirements. Technical Document

Included with Office 365

Effective Master Data Management with SAP. NetWeaver MDM

IT Outsourcing s 15% Problem:

The Power to Take Control of Software Assets

SNOW SOFTWARE. Fredrik Spolén Country Manager Sales Director. Norway Denmark Finland

University Managed Desktop Program. Desktop Computing Support Framework or

Best Practices for Migrating to a New Service Desk Platform. Strategies for Long-Term Success

Asset Vulnerability: The Six Greatest Risks Facing IT Asset Inventory and Management and the Single Automated Solution

SAM Benefits Overview SAM SOFTWARE ASSET MANAGEMENT

Taking Control of Spend Data Management and Analytics Without Bothering IT

Device Lifecycle Management

Migrating to Windows 7 - A challenge for IT Professionals

Maximizing Your Desktop and Application Virtualization Implementation

Simplify and Automate IT

Becoming a Cloud Services Broker. Neelam Chakrabarty Sr. Product Marketing Manager, HP SW Cloud Products, HP April 17, 2013

Transcription:

DATA QUALITY ISSUES IN THE CMDB e-book

Contents Chapter 1: Overview 3 Research Highlights Five Data Problems in Every CMDB 3 Why Clean Data Is Essential for your CMDB 3 What the Numbers Tell Us About Data in the CMDB 4 Chapter 2: The 5 issues with data quality 5 Problem #1: Inconsistent Data 5 Problem #2: Duplicate or Conflicting Data 7 Problem #3: Irrelevant Data 8 Problem #4: Incomplete Data 8 Problem #5: Outdated Data 11 Chapter 3: Summary 12 Data-as-a-Service 12 About BDNA 13 About This Report 13 2

Chapter 1: Overview Research Highlights FIVE Data Problems in Every CMDB A Configuration Management Database (CMDB) is only as good as the data inside it. Unfortunately, most CMDBs are filled with data that s outdated, inconsistent, or incomplete. How do we know? BDNA took a close look at the data sources and patterns among its extensive customer base and in the IT hardware and software market. This report highlights the findings. What a careful look at the numbers tells us is this: data in most CMDBs is not clean and without clean data, you cannot get the results you want and need from your CMDB. It s not your fault The problem isn t the CMDB software or the processes you use to populate and manage the CMDB. Nor is it the fault of the vendors of the various assets. It s simply an unfortunate side effect of the complex, dynamic IT world that we live in today. Why Clean Data Is Essential for your CMDB A configuration management database (CMDB) supports critical roles in your organization, including decisions and initiatives such as: IT or Software Asset Management (ITAM/SAM) Service Desk in IT Service Management (ITSM) Licensing compliance, procurement, audits Enterprise architecture planning & governance These functions all depend on the CMDB having clean data. Without clean data, asset managers won t accurately know who owns a particular piece of software, where it is deployed and which version is deployed. Similarly, without having comprehensive data about a piece of hardware, support engineers might not be able to quickly troubleshoot issues on it. The CMDB also supports a number of big IT decisions and projects, including: Data center consolidation/transformation Application rationalization/modernization Windows migration Inconsistent, incomplete or out-of-date data in the CMDB will compromise these projects. In fact, according to Gartner, poor data quality is a primary reason for 40% of all business initiatives failing to achieve their targeted benefits. 3

Clean Data What does clean data mean in the context of the CMDB? It means that the data is consistent, authoritative, complete, and up-to-date. Consistent: As a central repository for data from many sources, the CMDB must clean up any inconsistencies in how data is reported. Authoritative: All data should be de-duplicated and any conflicts resolved. The CMDB also needs change management to ensure that only authorized individuals can update it. Complete: The CMDB should store all relevant information for configured items, including relationships. Up-to-date: The data in the CMDB must remain up-to-date and accurate even as the environment and external conditions change. What the Numbers Tell Us About Data in the CMDB Only a careful analysis of your own CMDB can tell you the exact state of your data. However, by looking at the data sources that feed the CMDB, we can draw some conclusions about the likelihood that the data is clean. Figure 1: Data in the CMDB from multiple sources. Bad news: the numbers aren t good. Several factors affect every CMDB, no matter how well designed and implemented: IT data sources: The data sources are IT systems that that do not themselves have information about every aspect of the asset. Multiple sources: The typical IT organization collects asset data from at least three sources for the CMDB often many more. Lack of standardization: These different data sources have many different ways of referring to the same software or hardware. Vendor-induced complexity: Vendors themselves are constantly adding to the complexity by acquiring companies, updating versions and renaming products. Taken together, these factors contribute to four essential problems with CMDB data: Inconsistent data Duplicate or conflicting data Incomplete data Out-of-date data 4

Chapter 2: The 5 issues with data quality Problem #1: Inconsistent Data We ve already seen that data in a CMDB comes from multiple sources including discovery tools, procurement and provisioning systems, etc. Here s the problem each of these tools represents vendors and products in different ways. HP Universal Discovery or DDMI might represent the same product in a different way than Microsoft SCCM. The lack of standardization makes data consistency a real problem. Examples of inconsistent data Example 1: Vendors Major vendors may go by many names in the CMDB. For example, you might find HP listed as a vendor in a variety of ways, including: Company Hewlett Packard Hewlett-Packard Company Hewlett Packard Hewlett Packard Development Group, L.P. Hewlett-Packard Hewlett-Packard (Peregrine Systems) Hewlett-Packard Company Development Company, L.P. Hewlett-Packard, Co. HP Software LightScribe Motive Communications, Inc. Opsware Inc. Hewlett-Packard Development Company, L.P. Hewlett Packard Development Company, L.P. Hewlett Packard, Inc. Hewlett-Packard (Mercury Interactive) Hewlett-Packard Company Compaq HP http://www.lightscribe.com Mercury Interactive Opsware Palm The average duplicity ratio in vendor names due to inconsistent naming is 10:1 5

Example 2: Product Names Product naming can be even more complex, between different product names, versions, and terminology. Different discovery tools will report the same data in different ways. Consider Adobe Acrobat in some form or another, it s on an awful lot of computers. Here is a sample of the many different ways that the various versions of Acrobat might appear: "Adobe Acrobat 8 Standard English "Adobe Acrobat 6.0 Standard English "Adobe Acrobat 7.0 Standard English Acrobat Software Adobe Acrobat Acrobat Standard 6.0.1 R1 Acrobat Standard 8.1.2 (R1) Acrobat Standard 8.1.5 (R1) Acrobat X Pro Acrobat 05 Acrobat X Professional 10.0.0 Acrobat 06 Acrobat X Professional 10.0.0 Acrobat 4.0 Acrobat 5 Acrobat 6.0 Pro Acrobat 6.0 Standard Acrobat 6.0.2 Professional.app Acrobat 7 Pro Acrobat X Standard Acrobat.com.app Acrobat_com.app Acrobat7.0.7_01 AcrobatProfessional AcrobatProfessional [AIS] Acrobat 7 Standard AcrobatProfessional [AIS] 08.00.0000.0101 Acrobat 8 Pro AcrobatProfessional [AIS] 09.00.0000.0101 Acrobat 8 Professional AcrobatProfessional 08 Acrobat 8 Standard AcrobatProfessionalExtended [AIS] Acrobat 9 Pro AcrobatProfessionalExtended [AIS] 09.00.0000.0101 Acrobat 9 Standard AcrobatProfessionalExtended 09 Acrobat Professional 11.0.02 (R1) Acrobat Professional 6.0.1 (R1) Acrobat Professional 6.0.1 R1 Acrobat Professional 8.1.1 (R1) Acrobat Professional 8.1.5 (R1) Acrobat Professional 9.0 R1 AcrobatReader_705 AcrobatReader_708 The average duplicity ratio in software product names due to inconsistent naming is 20:1 6

Solving the data consistency problem You don t want to try to solve this problem yourself it s too big and varied. Most of the data originates from outside your business. You shouldn t spend your internal resources on external data quality problems. Instead, connect your CMDB with a reference catalog that will map CIs from various sources to consistent terminology. This process is called identity reconciliation. This creates a common language of IT and ensures consistency. All of your various Acrobats will be represented consistently in the CMDB, so you can do eaningful analysis. Problem #2: Duplicate or Conflicting Data Because CMDBs combine data from multiple different sources, they often contain duplicate data and conflicting data. If the CMDB is to be the authoritative system of record supporting IT decisions and processes, it needs to resolve those conflicts and filter out duplicates. The problem can be significant. Data form Source 1 Duplicate 40% Data form Source 2 Figure 2: 40% of data from multiple sources is duplicate. Research shows that 40 percent of data collected across different sources is duplicate. Examples of duplicate or conflicting data Different inventory and asset management tools may report conflicting or duplicate data, for many reasons. Procurement systems may report bundles of licensing, while discovery systems report the various components of the bundles. For example, procurement systems might report that a system has Microsoft Office, while a discovery tool might report the individual software components, including MS Word, Excel, PowerPoint and Outlook. And the same software may appear in different systems the CMDB must be able to determine whether multiple listings reflects multiple instances or inconsistencies in how data is reported (as above). When the problem scales to millions of lines of data, finding and fixing the duplicates/conflicts can be a huge task. Solving the duplication and conflict problem The first step in this problem is making the data consistent, using a common language of IT (see Step #1). Once you have done this, you can start finding and filtering out duplicate data, and note the conflicts when they occur. This process is called data normalization. One important aspect of data normalization is Authority Reconciliation i.e. being able to define the hierarchy of authority among various sources. This will help you identify problems with conflicting data and choose authoritative sources. In most large organizations, the scale of the problem is too large for manual processes. You will need automated processes for filtering and de-duplicating data. 7

Problem #3: Irrelevant Data Another issue is data relevance. Research shows that 95% of data gathered from various Discovery sources is irrelevant. Removing the irrelevant data can reduce the data footprint significantly. Relevancy Ratio in data 5% 95% Relevant Irrelevant Figure 3: 95% of data from discovery sources is irrelevant For example, when looking at data for an audit, you do not need to look at files such as.dlls, knowledge base files, patches, and.exe files that are individual components of a larger bundle. Ignoring those files helps to eliminate the noise, so that you can focus on the remaining 5% of data that is relevant. Solving the irrelevancy problem Many Discovery tools will collect only data that they deem relevant. This can help reduce the amount of data somewhat. But relying solely on the Discovery tools to determine relevance will still result in inaccuracies and noise in the data. Instead, organizations should look for a catalog that is updated regularly with both the relevant and the irrelevant pieces of information, so that the irrelevant pieces can be filtered out easily. Problem #4: Incomplete Data Let s say you ve addressed the first two problems your data is consistent and authoritative. The data in the CMDB still has a serious limitation: it s missing data that you need to make decisions or for your ITIL processes to be effective. Data is only as good as its source for the CMDB, the source is your IT systems. A big part of what you really need to know comes from outside IT. IT systems lack critical information you need to know about all of your assets, such as end-of-life or end-of-support date, licensing/packaging options, current version, etc. These external data points market data are essential for many day-to-day processes. For example: Asset Managers can reduce costs procuring new software if they know the upgrade status of the software. Service Desk staff can troubleshoot problems faster if they know the temperature ratings for a piece of equipment that is overheating. Data center consolidation projects need information about the dimensions of various servers. Similarly, for transformation projects, you need to know what software can be virtualized, what is compliant with the latest version of Windows, etc. 8

Example: the data center consolidation scenario Let s say you ve got a data center consolidation project. These are the standard CIs (Configuration Items) your CMDB probably has today: Taxonomy Manufacturers Software Products Hardware Products Category 1 Entity Unique Product Identifier Product Name Category 2 Tier Product Name Model Name Taxonomy Description Contact Information Brand/Family Source Name Revenue Version Source Category Stock Symbol Edition Source Website Owner Component Software or Hardware Date Acquired Platform Suite Suite Components Licensable Vendor Category But to make decisions, you ll need more information, including: CPUs Manufacturer Model Number of Cores Flag to indicate if CPU supports multi-threading Maximum CPU slots in the machine ISA (Bit Mode) Clock Rate SW Published Support Dates Support Policy Support Levels Support End Dates SW Standardized Support Lifecycle Dates General Availability Date End of Life Obsolete Compatibility De-Supported Flag Discontinued Flag HW Physical Dimensions Profile (Tower, Desktop, Rack, Floor etc.) Weight (Minimum/Maximum) Height (Minimum/Maximum) Weight (Minimum/Maximum) Width (Minimum/Maximum) HW Lifecycle Dates Date Introduced Date Available End of Life End of Sales Last Support Date De-Supported Flag HW Power Consumption AC Power (Unit, Maximum, Average/Typical) DC Power (Unit, Maximum, Average/Typical) AC Heat Dissipation (Unit, Maximum, Average/Typical) DC Heat Dissipation (Unit, Maximum, Average/Typical) 9

In addition, you might need the following: CPUs HW Power Consumption IBM PVU Core Energy Star Compatibility EPEAT 32 Bit Windows 7 Application Compatibility Operating Temperature (Minimum, Maximum) 32 Bit Windows 7 Application Compatibility Upgrade Path Non-Operating Temperature (Minimum, Maximum) 32 Bit Windows 7 Application Compatibility Date Operating Humidity (Minimum, Maximum) 64 Bit Windows 7 Application Compatibility Non-Operating Humidity (Minimum, Maximum) 64 Bit Windows 7 Application Compatibility Upgrade Path AC Input Single/Low Setting Voltage (Minimum, Maximum) 64 Bit Windows 7 Application Compatibility Date AC Input Single/Low Setting Current (Minimum, Maximum) 32 Bit Windows 8 Application Compatibility AC Input High Setting Voltage(Minimum, Maximum) 32 Bit Windows 8 Application Compatibility Upgrade Path AC Input High Setting Current (Minimum, Maximum) 32 Bit Windows 8 Application Compatibility Date DC Input Single/Low Setting Voltage 64 Bit Windows 8 Application Compatibility DC Input Single/Low Setting Current 64 Bit Windows 8 Application Compatibility Upgrade Path DC Input High Setting Voltage 64 Bit Windows 8 Application Compatibility Date DC Input High Setting Current AC Frequency (Minimum, Maximum) This adds up to 30 pieces of market intelligence data required for a data center consolidation scenario, with an additional 28 market intelligence data points that you might need. Data needed for other scenarios Data center consolidation is just one big process for which you use your CMDB. Here are the number of items you might need for other IT projects and processes involving the CMDB: Figure 4: Number of data pieces required for various IT projects.. 10

Solving the incomplete data problem Rather than chasing down external market intelligence for every new decision and initiative, you should integrate the relevant external data (extended specifications, support status, etc.) to the CMDB data. To do this most effectively, instead of trying to add more data in the CMDB, you ll need detailed, up-to-date data from all of the vendors appended to the items in your catalog. This involves getting the additional data from the vendors, from market research, etc, and is not an easy task to do manually. This information should ideally be provided by the catalog vendor. Problem #5: Outdated Data It s a virtual certainty that some of the data in your CMDB is out of date. 25 percent of the Fortune 500 use software that is past its support date. Networks are changing all the time, with new devices and software. And the vendors are constantly changing, with new software versions, patch updates, product names, mergers and acquisitions, and support changes. For each asset in the IT environment, there are multiple data points. Some of these are static e.g. release date of a software, or country of origin. Others are dynamic e.g. version, supported languages, etc since these can change in the future. A large organization may have hundreds of CI updates each week. Figure 5: The complexity of the IT Landscape Example of the outdated data problem Let s say you ve got a data center consolidation project coming up. You put a team to working updating the CMDB and adding necessary marketing intelligence (see Step 3). The process takes three months. By the time you re done, the target has already moved. The industry keeps changing, issuing new releases, sending new patches, while you re gathering the data. Organizations often spend a lot of time and money in the initial setup of the CMDB. But then they find that the CIs change on a very frequent basis. Failure to maintain the CIs makes data in the CMDB outdated very quickly. Solving the outdated data problem To get the most from your CMDB investment, instead of trying to fix the problem in the CMDB, you need to keep update your catalog frequently as often as once a week or more. Having an updated catalog that the CMDB data can sync with will resolve many issues. 11

Chapter 3: Summary Because many factors affect data quality, only a very small percentage of the overall IT data is really clean. When you analyze the data to make decisions, you ll need to discard much of the data. Then enrich the remaining clean data with market information to get to the data that really matters. Figure 6: A very small subset of all data is CLEAN. Data-as-a-Service Managing the four key data problems in the CMDB is a task greater than any individual or even team of individuals can easily undertake. Happily, there are other ways. Rather than manually cleaning the CMDB data, you can keep it clean and updated with a data-as-a-service solution focused on IT data. Data-as-a-service (or DaaS) does for data what SaaS does for software delivers it efficiently to the point of business need. In this case, that point is the CMDB. A DaaS offering delivers the vendor-specific data that originates outside the organization and integrates it with the discovered data in the CMDB. It provides clean data in the CMDB by using an external catalog to make data consistent, normalizing and de-duplicating the data and enriching it with market intelligence. And, because an external organization is responsible for maintaining the data, you get constant data updates without manual effort. As the volume of assets under management continues to grow in IT organizations, maintaining clean data in the CMDB is nearly impossible using ad hoc or manual processes. DaaS is the path to maintaining value in the CMDB as the IT environment itself constantly changes. 12

About BDNA BDNA Data-as-a-Service helps businesses solve the problems of inconsistent, duplicate, incomplete and outdated data by: Normalizing, reconciling, filtering and deduplicating data from multiple sources Enriching data with marketing intelligence Updating the market intelligence daily Curating the content for IT organizations integrating it with the CMDB data to supply the clean data you need to make smart decisions. BDNA s Data as a Service for Clean Data is available on-premise as well as in the Cloud. Figure 7: Clean Data with BDNA. About This Report Most of the data in this report comes from BDNA s Technopedia the world s largest categorized repository of information on enterprise software and hardware. Technopedia Categorizes and aligns over 35 million total data points about: Over 260,000 software releases on 260,000+ platforms Over 360,000 hardware products Over 14,000 vendors Updates 2,000 data points on a daily basis. Some of the data also comes from the network of BDNA s customers, who represent many of the world s largest organizations. BDNA Corporation 339 North Bernardo Avenue, Suite 206 Mountain View, CA 94043 Tel: (650) 625-9530 Fax: (650) 625-9533 www.bdna.com info@bdna.com Copyright 2013. All rights reserved. IT Genome, IT Genome Center, The IT Genome Company, BDNA Technopedia, BDNA Discover, BDNA Normalize, BDNA Enrich, and BDNA Publish are trademarks of BDNA Corporation. Other trademarks, registered trademarks or service marks are property of their respective owners. WPW7M10.19.10 13