IBM Software The fundamentals of data lifecycle management in the era of big data



Similar documents
IBM InfoSphere Optim Data Masking solution

IBM InfoSphere Optim Test Data Management

IBM Software Wrangling big data: Fundamentals of data lifecycle management

Business-driven governance: Managing policies for data retention

IBM Software Making the case for data lifecycle management

IBM InfoSphere Optim Test Data Management Solution

IBM Software Five steps to successful application consolidation and retirement

IBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite

IBM Software White Paper. Benefits of data archiving in data warehouses

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Proven strategies for archiving complex relational data

Application retirement: enterprise data management strategies for decommissioning projects

IBM Optim. The ROI of an Archiving Project. Michael Mittman Optim Products IBM Software Group IBM Corporation

IBM Software Understanding big data so you can act with confidence

IBM Tivoli Storage Manager Suite for Unified Recovery

IBM Software Four steps to a proactive big data security and privacy strategy

IBM Software Integrating and governing big data

Using the cloud to improve business resilience

IBM Software Top tips for securing big data environments

IBM Tivoli Storage Manager for Virtual Environments

How To Protect Data From Attack On A Computer System

IBM SmartCloud Monitoring

Test Data Management in the New Era of Computing

Reduce your data storage footprint and tame the information explosion

IBM Software. The MDM advantage: Creating insight from big data

Optimize workloads to achieve success with cloud and big data

IBM Software Delivering trusted information for the modern data warehouse

8 Steps to Holistic Database Security

Continuing the MDM journey

Strengthen security with intelligent identity and access management

IBM Unstructured Data Identification and Management

IBM Tivoli Storage FlashCopy Manager

Streamline enterprise application upgrades with data life cycle management

Data virtualization: Delivering on-demand access to information throughout the enterprise

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IBM Software Business-driven data privacy policies

Control application data growth before it controls your business

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

InfoSphere Governance Solutions Maximizing your Information Supply Chain

Real-Time Database Protection and. Overview IBM Corporation

The Smart Archive strategy from IBM

Optimize data management for. smarter banking and financial markets

IBM Information Archive for , Files and ediscovery

Balance and maximise your Oracle EBS investment with IBM Optim A Priceline and Travel Industry Case Study Philip McBride

Consolidated security management for mainframe clouds

Effective Storage Management for Cloud Computing

Driving workload automation across the enterprise

IBM Software Database strategies for the world of big data

IBM Maximo Asset Management Essentials

Consolidating security across platforms with IBM System z

Informatica Application Information Lifecycle Management

Welcome Tata Consulting Services, DSP Managed Services IBM and Azlan. Oracle e-business Suite. R12 Upgrade Workshop Summer 2011

Reducing the cost and complexity of endpoint management

IBM Analytical Decision Management

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

Security management solutions White paper. IBM Tivoli and Consul: Facilitating security audit and compliance for heterogeneous environments.

Effective storage management and data protection for cloud computing

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

IBM Software Data security strategies for next generation data warehouses

Test Data Management for Security and Compliance

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Anatomy of an archiving project

For healthcare, change is in the air and in the cloud

Taking control of the virtual image lifecycle process

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

How To Use Big Data To Help A Retailer

Preemptive security solutions for healthcare

IBM Security Privileged Identity Manager helps prevent insider threats

Beyond the Single View with IBM InfoSphere

IBM Analytics Make sense of your data

How To Create An Insight Analysis For Cyber Security

Predictive analytics with System z

The business value of improved backup and recovery

Optimizing government and insurance claims management with IBM Case Manager

Application Monitoring for SAP

Fiserv. Saving USD8 million in five years and helping banks improve business outcomes using IBM technology. Overview. IBM Software Smarter Computing

White paper September Realizing business value with mainframe security management

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Use service virtualization to remove testing bottlenecks

IBM BigInsights for Apache Hadoop

IBM Tivoli Netcool Configuration Manager

The IBM Cognos family

Securing and protecting the organization s most sensitive data

ORACLE DATA INTEGRATOR ENTEPRISE EDITION FOR BUSINESS INTELLIGENCE

The IBM Cognos Platform

The case for cloud-based data backup

Beyond listening Driving better decisions with business intelligence from social sources

Information management software solutions

IBM Optim. Strategies for Successful Data Governance. Eric Offenberg, CIPP IBM Software Group IBM Corporation

IBM Cognos Insight. Independently explore, visualize, model and share insights without IT assistance. Highlights. IBM Software Business Analytics

Anatomy of an Oracle E-Business Suite archiving project

IBM Content Analytics with Enterprise Search, Version 3.0

50x Zettabytes*

BLACKICE ERA and PureData System for Analytics

IBM SmartCloud Workload Automation

Anatomy of a PeopleSoft Enterprise archiving project

White Papers. Best Business Practices in Implementing IBM Optim. Abstract. Seemakiran Head of India Operations

IBM Storwize V7000: For your VMware virtual infrastructure

Three guiding principles to improve data security and compliance

Transcription:

IBM Software The fundamentals of in the era of big data How complements a big data strategy

The fundamentals of in the era of big data 1 2 3 4 5 6 Introduction Big data, big impact: Dealing with the Best practices: Putting data lifecycle into action The power of enterprise-scale Enhance data warehouse agility with Why InfoSphere?

The fundamentals of in the era of big data Introduction Organizations are eager to harness the power of big data. But as new big data opportunities emerge, ensuring that information is trusted and protected becomes exponentially more difficult. If these challenges are not addressed directly, end users may lose confidence in the insights generated from their data which can leave them unable to act on new opportunities or address threats. The tremendous volume, variety and velocity of big data means that the old manual methods of discovering, governing and correcting data are no longer feasible. Organizations need to automate information integration and governance from the start. By automating information integration and governance and employing it at the point of data creation and throughout its lifecycle, organizations can help protect information and improve the accuracy of big data insights. 3 1 Introduction 2 Big data, big impact: Putting into action

The fundamentals of in the era of big data Information integration and governance solutions must become a natural part of big data projects. They must support automated discovery and profiling and they must facilitate an understanding of diverse data sets to provide the complete context required to make informed decisions. They must be agile enough to accommodate a wide variety of data and seamlessly integrate with diverse technologies, from data marts to Apache Hadoop systems. Plus, they must discover, protect and monitor sensitive information across its lifecycle as part of big data applications. Understanding the context of data and being able to extract the precise information necessary to meet a business objective is key to utilizing big data to the fullest. Managing the so that data is accurate, is appropriately used and is correctly stored to meet the required service levels and retention needs has wide-ranging benefits. These benefits include risk reduction, performance improvements and preventing an overload of useless information. This e-book explores the challenges of managing big data, best practices for enterprise-scale and how IBM InfoSphere Optim solutions incorporate a comprehensive range of information integration and governance capabilities that enable companies to properly manage data over its lifetime. 4 1 Introduction 2 Big data, big impact: Putting into action

The fundamentals of in the era of big data Big data, big impact: Without effective, the increasing volume, variety and velocity of big data can reduce performance, increase margins and amplify risks. Performance and time-to-market As more users execute more queries on larger data volumes, slow response times and degraded application performance become major issues. If left unchecked, continued data growth will stretch resources beyond capacity and negatively impact response time for critical queries and reporting processes. These problems can affect production environments and hamper upgrades, migrations and disaster recovery efforts. Implementing intelligent data of historical, dormant data is essential for avoiding these potentially business-halting issues. Rapid data growth also makes testing more difficult. As data warehouses and big data environments grow to petabytes or more, testing processes are taxed by having to cull data for their specific needs. The results include longer test cycles, slower time-to-market and fewer defects identified in advance of release. Speeding up testing workflows and delivery of data warehouses requires organizations to automate the creation of realistic rightsized test data while keeping appropriate security measures in place. Margins Exponential data growth also can drive up infrastructure and operational costs, often consuming most of an organization s data warehousing or big data budget. Rising data volumes require more capacity, and organizations often must buy more hardware and spend more money to maintain, monitor and administer their expanding infrastructure. Large data warehouses and big data environments generally require bigger servers, appliances and testing environments, which can also increase software licensing costs for the database and database tooling, not to mention labor, power and legal costs. 5 Putting into action

The fundamentals of in the era of big data Risks Following the let s keep it in case someone needs it later mandate, many organizations already keep too much historical data. According to the CGOC 2012 Summit Survey, 69 percent of data has no value. Opening the doors to excessive storage and retention only exacerbates the situation. At the same time, organizations must ensure the privacy and security of the growing volumes of confidential information. Government and industry regulations from around the world, such as the Health Insurance Portability and Accountability Act (HIPAA), the Personal Information Protection and Electronic Documents Act (PIPEDA) and the Payment Card Industry Data Security Standard (PCI DSS) require organizations to protect personal information no matter where it lives even in test and development environments. Data breaches and attacks risk negative consumer sentiment 75% of IT risks impact customer satisfaction and brand reputation 75 % 43 % 43% are increasing focus on reputational risk because of growth in emerging technologies such as social media Maintaining compliance with data retention regulations, protecting privacy and archiving data are not just legal matters they are essential for sustaining customer satisfaction and brand reputation. In recent IBM surveys, respondents indicate that data theft/ cybercrime is the number-one threat to a company s reputation a greater threat than system failures. Sixty-four percent of respondents say their company will be focusing more on managing and protecting their reputation than they did five years ago. 1 Source: Insights from the 2012 Global Reputational Risk and IT Study. 6 Putting into action

The fundamentals of in the era of big data The danger of treating a backup as an archive Many organizations are confused about the difference between archiving and backing up data. Archiving preserves data, providing a long-term repository of information that can be used by litigation and audit teams. By contrast, backing up data involves copying production data and moving it to another environment to enable disaster recovery and the restoration of deleted files. Backups are often retained for a short time, until a fresh backup replaces the existing backup. Archiving complements backups by removing old, redundant and infrequently accessed data from a system and by reducing the size of databases and their backups. Approximately 75 percent of the data stored is typically inactive, rarely accessed by any user, process or application. An estimated 90 percent of all data access requests are serviced by new data usually data that is less than a year old. 2 With an effective archiving strategy, organizations can protect old data and comply with data retention rules while reducing costs and enhancing system performance. In an attempt to meet archiving needs, some organizations simply back up data to a Hadoop environment. But this kind of backup will not ensure that data will be fully protected or remain query-able, the way a true archive would. With an effective data lifecycle solution, companies can create an archive that protects data, meets compliance standards, and supports queries and reporting. An emerging trend is for organizations to use Hadoop as a lower-cost storage alternative for archives. 7 Putting into action

The fundamentals of in the era of big data Best practices: Putting into action The stretches through multiple phases as data is created, used, shared, updated, stored and eventually archived or defensively disposed. Data lifecycle plays an especially key role in three of these phases of data s existence: archiving, test data and data masking. The entire (shown as the grey circle) benefits from good governance, but capabilities that focus on the use, share and archive steps have wide-ranging benefits for cost reduction and efficiency gains. Where tasks fall in the Archiving Store /retain Dispose Archive Update Test data Create Use Share Data masking Archiving Retention policies are designed to keep important data elements for reference and for future use while deleting data that is no longer necessary to support the legal needs of an organization. Effective includes the intelligence not only to archive data in its full context, which may include information across dozens of databases, but also to archive it based on specific parameters or business rules, such as the age of the data. It can also help storage administrators develop a tiered and automated storage strategy to archive dormant data in a data warehouse, thereby improving overall warehouse performance. 8 Putting into action

The fundamentals of in the era of big data Enterprise information 69% Everything else 1% Subject to legal hold 25% Has business utility 31% 5% Regulatory record keeping Many organizations hope that big data will provide a large, centralized lake of data, but in many cases, it becomes a data swamp full of unreliable information. Many organizations envision big data as a large, pristine, centralized data lake. But a data lake can quickly turn into a data swamp when data is poorly managed and controlled. By setting up an intelligent data lifecycle strategy and archiving to inexpensive storage, you can avoid turning your big data environment into a dumping ground. Test data In development, testers must automate the creation of realistic, rightsized data sources that mirror the behaviors of existing production databases. To ensure that queries can be run easily and accurately, they must create a subset of actual production data and reproduce actual conditions to help identify defects or problems as early as possible in the testing cycle. The tremendous size of big data systems creates challenges for testers. There is a greater need to speed delivery of big data applications, requiring organizations to create realistic, rightsized, masked test data for testing those applications for performance and functionality. Testers also need ways to generate test data sets that facilitate realistic functional and performance testing. Because production data contains information that may identify customers, organizations must mask that information in test environments to maintain compliance and privacy. 9 Putting into action

The fundamentals of in the era of big data Applying data masking techniques to the test data means testers use realisticlooking, but fictional data no actual sensitive data is revealed. Application developers can also use test data technologies to easily access and refresh test data, which speeds the testing and delivery of the new data source. Organizations also need ways to mask certain sensitive data, such as credit card and phone numbers. While testing their big data environments, they must mask sensitive data from unauthorized users, even though those users might be authorized to see the data in aggregate. For example, a pharmaceutical company that is testing its data warehouse environment might mask Social Security numbers and dates of birth but not patients ages and other demographic information. Masking certain data this way satisfies corporate and industry regulations by removing identifiable information, while still maintaining business context and referential integrity for testing in nonproduction environments. Original data Customers table Cust ID Name Street 08054 Alice Bennett 2 Park Blvd 19101 Carl Davis 258 Main 27645 Elliot Flynn 96 Avenue Orders table Cust ID Item # Order date 27645 80-2382 20 June 2004 27645 86-4538 10 October 2005 De-identified data Customers table Cust ID Name Street 10000 Auguste Renoir 23 Mars 10001 Claude Monet 24 Venus 10002 Pablo Picasso 25 Saturn Orders table Cust ID Item # Order date 10002 80-2382 20 June 2004 10002 86-4538 10 October 2005 Data masking techniques protect the confidentiality of private information. 10 Putting into action

The fundamentals of in the era of big data Private cloud Public cloud Complex IT landscapes make setting up test labs extremely costly Third-party services Routing services Collaboration Web/Internet Portals EJB Content providers Messaging services Archives Business partners Shared services As volume, variety and velocity impacts the complexity of data infrastructures, scaling test environments becomes a significant problem. It isn t unusual for Fortune 500 companies to spend up to USD30 million building a single test lab and many of these organizations have dozens of labs. Add in rising wages, and testing costs begin to spiral out of control. Data warehouse Directory identity Mainframe Enterprise service bus Heterogeneous environments File systems 11 Putting into action

The fundamentals of in the era of big data The power of enterprise-scale Effective benefits both IT and business stakeholders. Increasing margin: Lower infrastructure and capital costs, improved productivity and reduced application defects during the development lifecycle. Reducing risks: Reduced application downtime, minimized service and performance disruptions, and adherence to data retention requirements. Promoting business agility: Improved time-to-market, increased application performance and improved quality of applications through realistic test data. With InfoSphere Optim, organizations gain a single solution that can scale to meet enterprise needs. Whether they implement InfoSphere Optim for a single application, data warehouse or big data environment, organizations can streamline with a consistent strategy. The unique relationship engine in InfoSphere Optim provides a single point of control to guide data processing activities such as archiving, subsetting and retrieving data. 12 Putting into action

The fundamentals of in the era of big data Enhance data InfoSphere Optim solutions help organizations meet requirements for information integration and governance and address challenges exacerbated by the increasing volume, variety and velocity of data. By archiving old data from huge data warehouse environments, businesses can improve response times and reduce costs by reclaiming valuable storage capacity. By creating realistic, rightsized data sources for testing, they can enhance the accuracy of testing and identify problems early in the testing cycle. And by implementing data masking capabilities, they can protect sensitive data and help ensure compliance with privacy regulations. As a result, organizations gain more control of their IT budget while simultaneously helping their big data and data warehouse environments run more efficiently and reducing the risk of exposure of sensitive data. InfoSphere Optim supports major big data and data warehouse environments, including IBM PureData for Analytics, IBM PureData for Transactions, BigInsights, Teradata, Oracle and popular Hadoop distributions. It also supports enterprise databases and operating systems, including IBM DB2, Oracle Database, Sybase, Microsoft SQL Server, IBM Informix, IBM IMS, IBM Virtual Storage Access Method (VSAM), Microsoft Windows, UNIX, Linux and IBM z/os. In addition, InfoSphere Optim supports key enterprise resource planning (ERP) and customer relationship (CRM) applications such as Oracle E-Business Suite, PeopleSoft Enterprise, JD Edwards EnterpriseOne, Siebel, Amdocs CRM and the SAP ERP and CRM applications, as well as many custom applications. 13 Putting into action

The fundamentals of in the era of big data The value of test data at a US insurance company With 42 high-volume back-end systems needed to generate a full end-to-end system test, a US insurance company could not confidently launch new features. Testing in production was becoming the norm. In fact, claims could not be processed in certain states because of application defects that the teams skipped over during the testing process. IT was consuming an increasing number of resources yet application quality was declining rapidly. After implementing a process to govern test data, the insurance company reduced the costs of testing by USD400,000 per year. Today, the company can easily refresh 42 test systems from across the organization in record time while finding defects in advance. The business value from implementing test data included: $500,000 44 percent fewer untested scenarios 44 % 41 % Cost savings of approximately USD500,000 per year 41 percent less labor required over 12 months 14 Putting into action

The fundamentals of in the era of big data Why InfoSphere? As the foundation of the IBM big data platform, InfoSphere provides market-leading functionality across all the capabilities of information integration and governance. It is designed to handle the challenges of big data by providing optimal scale and performance for massive data volumes, agile and rightsized integration and governance for the increasing velocity of data, and support for a wide variety of data types and big data systems. InfoSphere helps make big data and analytics projects successful by delivering the confidence to act on insight. InfoSphere capabilities include: Metadata, business glossary and policy : Define metadata, business terminology and governance policies with Business Information Exchange. Data integration: Handle all integration requirements, including batch data transformation and movement (InfoSphere Information Server), real-time replication (InfoSphere Data Replication) and data federation (InfoSphere Federation Server). Data quality: Parse, standardize, validate and match enterprise data with InfoSphere Information Server for Data Quality. Master data : Act on a trusted view of your customers, products, suppliers, locations and accounts with InfoSphere MDM. Data lifecycle : Manage data throughout its lifecycle, from requirements through retirement, with InfoSphere Optim test data automation and database archiving capabilities. Data security and privacy: Continuously monitor data access and protect repositories from data breaches, and support compliance with IBM InfoSphere Guardium. Ensure sensitive data is masked and protected with InfoSphere Optim. 15 Putting into action

The fundamentals of in the era of big data Additional resources Ready to get started? Take a self-service InfoSphere Optim Business Value Assessment and show the ROI results to your big data project owner. To learn more about InfoSphere Optim, check out these resources: Manage the Data Lifecycle of Big Data Environments Optim solutions for data warehouses Demo: Optim Data Growth Solution Demo: Optim Test Data Management Solution To learn more about the IBM approach to information integration and governance for big data, please contact your IBM representative or IBM Business Partner, or visit: ibm.com/software/data/information-integration-governance 16 Putting into action

Copyright IBM Corporation 2013 IBM Corporation Software Group Route 100 Somers, NY 10589 Produced in the United States of America August 2013 IBM, the IBM logo, ibm.com, BigInsights, DB2, Guardium, IMS, Informix, InfoSphere, Optim, PureData, and z/os are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. 1 Yuhanna, Noel. Your Enterprise Data Archiving Strategy. Forrester. February 2011. ftp://ftp.boulder.ibm.com/software/data/sw-library/ data-/optim/papers/your-enterprise-data-archiving-strategy.pdf 2 IBM 2012 Global Reputational Risk and IT Study. ibm.com/services/us/gbs/bus/html/risk_study-2012-infographic.html The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation. Please Recycle IMM14126-USEN-00