IBM Software Wrangling big data: Fundamentals of data lifecycle management



Similar documents
Business-driven governance: Managing policies for data retention

IBM Software Five steps to successful application consolidation and retirement

IBM InfoSphere Optim Test Data Management

IBM Software The fundamentals of data lifecycle management in the era of big data

IBM InfoSphere Optim Test Data Management Solution

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite

Real-time asset location visibility improves operational efficiencies

IBM Software Understanding big data so you can act with confidence

Embracing SaaS: A Blueprint for IT Success

Better planning and forecasting with IBM Predictive Analytics

IBM Unstructured Data Identification and Management

Optimizing government and insurance claims management with IBM Case Manager

For healthcare, change is in the air and in the cloud

IBM InfoSphere Optim Data Masking solution

Data virtualization: Delivering on-demand access to information throughout the enterprise

Easily deploy and move enterprise applications in the cloud

IBM SmartCloud Monitoring

Continuing the MDM journey

IBM System x reference architecture solutions for big data

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

How To Use Social Media To Improve Your Business

Solve your toughest challenges with data mining

IBM Cognos Controller

IBM Analytics The fluid data layer: The future of data management

The case for cloud-based data backup

Simplify security management in the cloud

Bunzl Distribution. Solving problems for sales and purchasing teams by revealing new insights with analytics. Overview

IBM Cognos Analysis for Microsoft Excel

IBM Software Integrating and governing big data

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

Taking control of the virtual image lifecycle process

Predictive analytics with System z

How To Use Big Data To Help A Retailer

The Smart Archive strategy from IBM

Stella-Jones takes pole position with IBM Business Analytics

Solve your toughest challenges with data mining

How To Transform Customer Service With Business Analytics

Achieving customer loyalty with customer analytics

Reducing the cost and complexity of endpoint management

Jabil builds momentum for business analytics

Getting the most out of big data

Setting smar ter sales per formance management goals

IBM Tivoli Netcool Configuration Manager

IBM Analytical Decision Management

IBM Cognos Controller

Premier. Helping healthcare providers deliver the best possible care to their patients. Smart is...

IBM Software Group IBM Marketing Center

IBM Cognos Enterprise: Powerful and scalable business intelligence and performance management

Test Data Management in the New Era of Computing

IBM Cognos TM1 on Cloud Solution scalability with rapid time to value

Empowering intelligent utility networks with visibility and control

5 tips to engage your customers with event-based marketing

Addressing IT governance, risk and compliance (GRC) to meet regulatory requirements and reduce operational risk in financial services organizations

IBM Software Four steps to a proactive big data security and privacy strategy

The business value of improved backup and recovery

Addressing government challenges with big data analytics

IBM Security Privileged Identity Manager helps prevent insider threats

Strengthen security with intelligent identity and access management

How To Create An Insight Analysis For Cyber Security

IBM Maximo Asset Management Essentials

IBM Analytics Make sense of your data

The top 10 secrets to using data mining to succeed at CRM

IBM Tivoli Netcool network management solutions for enterprise

IBM Endpoint Manager for Server Automation

IBM Security QRadar Risk Manager

Beyond listening Driving better decisions with business intelligence from social sources

IBM Tivoli Storage Manager Suite for Unified Recovery

IBM asset management solutions White paper. Using IBM Maximo Asset Management to manage all assets for hospitals and healthcare organizations.

IBM Software Integrated Service Management: Visibility. Control. Automation.

IBM InfoSphere Information Server Ready to Launch for SAP Applications

The IBM Cognos Platform

A European manufacturer

Balance and maximise your Oracle EBS investment with IBM Optim A Priceline and Travel Industry Case Study Philip McBride

IBM Content Analytics with Enterprise Search, Version 3.0

IBM System x and VMware solutions

Bytemobile, IBM and Datatrend speed optimization and monetization

Delivering information you can trust. IBM InfoSphere Master Data Management Server 9.0. Producing better business outcomes with trusted data

Making critical connections: predictive analytics in government

IBM Security QRadar Risk Manager

IBM Software Top tips for securing big data environments

IBM Tivoli Storage FlashCopy Manager

Afni deploys predictive analytics to drive milliondollar financial benefits

Optimize workloads to achieve success with cloud and big data

The IBM Cognos family

IBM Software Making the case for data lifecycle management

IBM Information Archive for , Files and ediscovery

Transcription:

IBM Software Wrangling big data: Fundamentals of data management How to maintain data integrity across production and archived data

Wrangling big data: Fundamentals of data management 1 2 3 4 5 6 Introduction Quality data, quality Managing the data Benefits across the enterprise Evaluating data management solutions Resources

Wrangling big data: Fundamentals of data management Introduction: Big data is a big deal Everywhere you turn in enterprise IT, you hear the buzz about big data: billboards, commercials and likely your own conference rooms and inboxes. The variety, volume and velocity of data streaming through enterprise systems is on the rise, and so is the amount of discussion about the best way to handle it. But while the big data phenomenon is giving organizations a broad range of new options for data analysis, it also compounds the business challenges associated with collecting, managing, organizing and protecting data. Successfully leveraging big data for analytics demands that companies develop strategies to reduce the cost of managing data and reduce the risk involved in organizing and protecting that data. In addition to these business challenges, enterprises face technical challenges in managing rapidly expanding volumes of all types of data. The sheer number of continually proliferating data sources introduces complexity into data management processes. If IT departments cannot manage information appropriately from the moment it is created to the point when it can be archived or defensibly disposed they risk violating legal and regulatory compliance requirements. To efficiently manage data throughout its entire, IT leaders must keep three objectives in mind: 1. Data veracity (trustworthiness) is critical for both analytics and regulatory compliance. 2. Both structured and unstructured data must be managed effectively. 3. Data privacy and security must be protected at all times. This e-book will explore how these objectives can help you corral big data into a manageable, dependable and eminently valuable enterprise resource. 3 1 Introduction 2 Quality data, quality 5 Evaluating data

Wrangling big data: Fundamentals of data management Quality data, quality It s obvious but important: the better the data, the better the. When the New Intelligent Enterprise a joint partnership between the MIT Sloan Management Review and the IBM Institute of Business Value conducted a survey on analytics, it found that organizations that used analytics for competitive advantage were 2.2 times more likely to substantially outperform their industry peers. 1 Ensuring the accuracy of the data used for analytics is a key factor in enhancing overall organizational productivity. But achieving that accuracy requires a strong commitment to data management the process of managing business information throughout its existence, regardless of where that information is stored. That means ensuring that large data volumes do not inhibit application performance in production environments, and acquiring the ability to both offload the data into archived systems and maintain accessibility to that archived data for retention and business continuity. 4 5 Evaluating data

Wrangling big data: Fundamentals of data management The emergence of big data only reinforces the need for effective data management. Big data is more than just information stored in an Apache Hadoopbased framework; it is also the structured data within data warehouses, databases and standalone applications. The size of the data source whether a single database, an entire data warehouse or a Hadoop framework doesn t matter. In the world of big data, all data sources are crucial and must be managed throughout their s. Data management principles come into play whenever data is moved or changed, such as: When new databases and data sources are first created Any time governance stewards need to track audits, assess compliance or protect personally identifiable data When line-of-business owners need to conduct real-time analysis When data is archived, restored or defensibly disposed But for data management to be most effective, it must be addressed when data sources are first created, at the beginning of any deployment, to maintain proper structure as new data is introduced. By applying these basic governance techniques, it becomes easier to add and accommodate new information in a data deluge. 5 5 Evaluating data

Wrangling big data: Fundamentals of data management Managing the data The data stretches through multiple phases as it is created, used, shared, updated, stored and eventually archived or defensively disposed. Three elements of data management play an especially key role in several phases of data s existence: 1. Test data management: During the process of developing new data sources, whether databases or data warehouses, test technicians must automate the creation of realistic, rightsized data sources that mirror the behaviors of existing production databases. To ensure that queries can be run easily and accurately, they must create a subset of actual production data and reproduce actual conditions to help identify defects or problems as early as possible in the testing cycle. But actual production data contains information that may identify customers a clear violation of privacy policies. That s why organizations must mask sensitive information in test environments for compliance and privacy. Applying data masking techniques to the test data means testers use realistic-looking but fictional data; no actual sensitive data is revealed. Leveraging test data management also means that technicians can easily access and refresh test data, speeding the testing and delivery of the new data source. Where management tasks fall in the data Archiving Store /Retain Archive Dispose Update Test data management Create Use Share Data masking The entire data (shown as the grey circle) benefits from good governance, but management capabilities that focus on the use, share and archive steps have wide-ranging benefits for cost reduction and efficiency gains. 6 5 Evaluating data

Wrangling big data: Fundamentals of data management 2. Data masking and privacy: The ability to mask data and protect privacy has implications beyond test data, extending into production data. Whether information is stored in Hadoop or a data warehouse, certain facets can be masked from unauthorized users, even though they may be authorized to see the data in aggregate. For example, a pharmaceutical company that is submitting drug-testing to the U.S. Food and Drug Administration may mask social security numbers and dates of birth, but not patients ages and other demographic information. Masking certain data this way satisfies corporate and industry regulations by removing identifiable information, while still maintaining business context and referential integrity in production environments. Data management: An industry view Not sure where data management fits into your organization? Here is a sample of datadriven applications in various industries that could benefit from more streamlined, accurate data: Energy and utilities Metering Utility rates and billing Healthcare Electronic medical records Insurance claims and payment systems Financial services Mergers and acquisitions Customer service actions and preferences Telecommunications Call-data records Customer relationship management systems Transportation Online travel booking and ticket information Insurance Claims processing and billing Pension and benefits accrual Retail Purchase records and customer profiles Sales, distribution and inventory management 7 5 Evaluating data

Wrangling big data: Fundamentals of data management 3. Archiving: Effective data management includes the intelligence not only to archive data, but archive it based on specific parameters or business rules, such as the age of the data. It can also help storage administrators develop a tiered and automated storage strategy to archive dormant data in a data warehouse, thereby improving overall warehouse performance. Intelligent data archiving supports data retention needs based on such factors as the age of the data, any legal hold requirements within data sources and defensible disposal of data. In addition, effective data management enables IT to query archived data in Hadoop-based systems, providing flexible access to that data. The effect is to reduce the total cost of ownership (TCO) of data sources by intelligently archiving and compressing historical data, while ensuring that data is available for applicable analytical purposes. 8 5 Evaluating data

Wrangling big data: Fundamentals of data management Benefits across Effective data management benefits both IT and business stakeholders. Because data management solutions provide a consistent view into data sources from a variety of angles, enterprises gain usability and training advantages. Test data management The ability to create realistic, rightsized test data helps ensure reliable, trustworthy test that reflect the way an application will actually perform with live data. All teams have access to accurate, appropriately protected data for their work. Developers can test multiple application functionalities and setups, confident that they can quickly create custom data sets for any target scenario. QA staff can validate performance and integrations based on specific test cases. Collaboration is streamlined by the sharing of data sets and reports based on repeatable test data, increasing the opportunity for predictable. Data masking Masking data in both development and production environments gives compliance officers and auditors higher levels of certainty regarding security and protection of personally identifiable information and other sensitive data. Customer service staff can see detailed purchase records without being given access to sensitive credit card details. Medical billing teams can access insurance and payment details but be prevented from seeing treatment plans or doctor s notes. Developers can pull subsets of address information to test location-based apps without seeing the actual names or other identifying characteristics associated with those records. Archiving Disciplined data archiving benefits any department that stores data which means almost all of them, from marketing to IT to executive management. HR teams can securely move old employee records into long-term cold storage, and save more time by enabling automatic removal and disposal of certain items as they pass retention deadlines. Well-maintained archives help the legal department quickly locate and retrieve items for auditing purposes. Marketing can include archived data in their analytic efforts, supporting strategic decisions with long-term historical evidence and trend evaluation. 9 5 Evaluating data

Wrangling big data: Fundamentals of data management Evaluating data The right data management solution for your company will depend greatly on the mix of data types and access requirements in your organization. When evaluating solutions and contemplating deployment strategies, seriously consider these three capabilities: 1. Compatible and scalable: The system should support heterogeneous environments across multiple databases, data warehouse environments, systems and platforms, as well as have the capacity to handle big data across production and testing environments. If it works with only some of your infrastructure and some of your data, you ll get only some of the possible benefits. 2. Understands the complete data relationship: Data loses value when it loses context. The most effective data management platforms ensure referential integrity when masking data, accurately subset data for test data management and support independent access to archived data. 3. Maintains accessibility: Managing and preserving data access based on retention requirements as well as system and employee needs is a vital component of extending data management benefits across an enterprise. Leading solutions have the flexibility to create intelligent, targeted policies for data access and archiving that help organizations improve application performance, reduce costs and minimize risks. 10 5 Evaluating data

Wrangling big data: Fundamentals of data management Delivering comprehensive data management capabilities With IBM InfoSphere Optim solutions, enterprises can easily govern data throughout its by creating and managing realistic test data, masking data in both development and production, and automating data archiving policy enforcement. InfoSphere Optim software is compatible with multiple databases, data warehouse environments and enterprise applications. Its ability to support multiple access methods to archived data makes it a valuable solution in supporting data retention requirements, as well as supporting the business-critical analytical needs of Hadoop-based systems. By managing the complete data from a single platform, enterprises can make sure their systems and staff members have the highest-quality information for all tasks and decisions. At the same time, developers and IT can secure and protect sensitive data to minimize risk and exposure, both in testing and other data delivery scenarios. And because InfoSphere Optim works with a variety of data sources, its value to scales as organizational data needs grow. InfoSphere Optim in action Toshiba TEC Europe, a leading subsidiary of the Toshiba Group, needed to proactively manage application data growth to support business expansion and deployment of Oracle E-Business Suite across business units. It implemented InfoSphere Optim to archive historical transactions and reduced batch processing time for 19,000 jobs from 250 hours to only 65 hours a 75 percent increase in application availability. Plus, it reduced database size by 30 percent, enabling it to manage continued data growth, yet retain access to current and historical transactions. As a result, Toshiba TEC was able to satisfy business unit requirements and support continued business growth. 11 5 Evaluating data

Wrangling big data: Fundamentals of data management Resources To learn more about data management for big data and the capabilities available in InfoSphere Optim solutions, please contact your IBM representative or visit: Manage the data of big data environments Solution brief: IBM InfoSphere Optim solutions for data warehouses IBM InfoSphere Optim Data Growth Solution demo IBM InfoSphere Optim Test Data Management Solution demo 12 5 Evaluating data

Copyright IBM Corporation 2013 IBM Corporation Software Group Route 100 Somers, NY 10589 Produced in the United States of America March 2013 IBM, the IBM logo, ibm.com, InfoSphere and Optim are registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data and client examples cited are presented for illustrative purposes only. Actual performance may vary depending on specific configurations and operating conditions. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation. 1 IBM Institute for Business Value in collaboration with MIT Sloan Management Review. Analytics: The widening divide. October 2011. Please Recycle IMM14103-USEN-00