IBM Software Delivering trusted information for the modern data warehouse

Similar documents
IBM Software Integrating and governing big data

Data virtualization: Delivering on-demand access to information throughout the enterprise

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IBM System x reference architecture solutions for big data

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

IBM PureFlex System. The infrastructure system with integrated expertise

IBM Software Understanding big data so you can act with confidence

A business intelligence agenda for midsize organizations: Six strategies for success

IBM Analytics Make sense of your data

IBM Software Cloud service delivery and management

IBM Software Hadoop in the cloud

Beyond the Single View with IBM InfoSphere

IBM Software Five steps to successful application consolidation and retirement

Optimize workloads to achieve success with cloud and big data

IBM Cognos Enterprise: Powerful and scalable business intelligence and performance management

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

A financial software company

IBM Software Wrangling big data: Fundamentals of data lifecycle management

The IBM Cognos family

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM Executive Point of View: Transform your business with IBM Cloud Applications

The IBM Cognos Platform

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM Software Four steps to a proactive big data security and privacy strategy

Big data management with IBM General Parallel File System

The Smart Archive strategy from IBM

IBM Cognos Insight. Independently explore, visualize, model and share insights without IT assistance. Highlights. IBM Software Business Analytics

Easily deploy and move enterprise applications in the cloud

IBM Cognos Business Intelligence on Cloud

Taking control of the virtual image lifecycle process

For healthcare, change is in the air and in the cloud

Luncheon Webinar Series May 13, 2013

Solutions for Communications with IBM Netezza Network Analytics Accelerator

IBM Analytical Decision Management

IBM InfoSphere Information Server Ready to Launch for SAP Applications

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Build an effective data integration strategy to drive innovation

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

IBM SmartCloud Monitoring

Using the cloud to improve business resilience

Smarter wireless networks

IBM Software. The MDM advantage: Creating insight from big data

Simplify security management in the cloud

Accelerating the path to SAP BW powered by SAP HANA

Optimizing government and insurance claims management with IBM Case Manager

IBM Storwize V5000. Designed to drive innovation and greater flexibility with a hybrid storage solution. Highlights. IBM Systems Data Sheet

IBM Analytics Prepare and maintain your data

IBM Data Warehousing and Analytics Portfolio Summary

Modern Data Integration

Reduce your data storage footprint and tame the information explosion

IBM Enterprise Linux Server

Strengthen security with intelligent identity and access management

The business value of improved backup and recovery

IBM Information Archive for , Files and ediscovery

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Cloud-based web hosting consolidation with an IBM Drupal solution

IBM InfoSphere Optim Test Data Management

Addressing government challenges with big data analytics

Getting the most out of big data

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

The IBM Cognos family

IBM Unstructured Data Identification and Management

IBM Tivoli Netcool network management solutions for enterprise

IBM Cognos TM1 on Cloud Solution scalability with rapid time to value

Safeguarding the cloud with IBM Dynamic Cloud Security

IBM System x and VMware solutions

Tapping the power of big data for the oil and gas industry

IBM Storwize V7000: For your VMware virtual infrastructure

IBM Tivoli Netcool Configuration Manager

Effective Data Integration - where to begin. Bryte Systems

Informatica Data Quality Product Family

Predictive analytics with System z

Empowering intelligent utility networks with visibility and control

The Future of Data Management

IBM PureApplication System for IBM WebSphere Application Server workloads

Effective Storage Management for Cloud Computing

Continuing the MDM journey

IBM Software Making the case for data lifecycle management

Informatica PowerCenter The Foundation of Enterprise Data Integration

Consolidated security management for mainframe clouds

IBM Software A Journey to Adaptive MDM

SOLUTION BRIEF BIG DATA MANAGEMENT. How Can You Streamline Big Data Management?

IBM Security Privileged Identity Manager helps prevent insider threats

IBM Global Business Services Microsoft Dynamics AX solutions from IBM

Ten Things You Need to Know About Data Virtualization

Harness the value of information throughout the enterprise. IBM InfoSphere Master Data Management Server. Overview

Driving workload automation across the enterprise

VMware Hybrid Cloud. Accelerate Your Time to Value

How To Create An Insight Analysis For Cyber Security

Optimize data management for. smarter banking and financial markets

IBM Tivoli Storage Manager for Virtual Environments

Delivering information you can trust. IBM InfoSphere Master Data Management Server 9.0. Producing better business outcomes with trusted data

How To Use Big Data To Help A Retailer

Focus on the business, not the business of data warehousing!

IBM Analytics. The truth about information governance and the cloud

Transcription:

Delivering trusted information for the modern data warehouse Make information integration and governance a best practice in the big data era

Contents 2

Introduction In ever-changing business environments, deriving business value from information remains a constant objective for IT and business professionals. To meet this goal, organizations must use information to make faster and more accurate decisions than ever before so they can innovate effectively and maintain a competitive edge. Enter big data a compelling yet challenging phenomenon that holds the potential for great benefits. Organizations can leverage big data to generate new insights that can lead to new markets, new offerings and new approaches. But to realize the opportunities posed by big data, organizations must begin to move to a new, stronger foundation of trusted information: the next-generation data warehouse environment. Organizations are expanding deployments of data integration tools to a growing number of projects. In 2013, 53 percent planned to deploy data integration tools in six or more projects or departments, up from 49 percent in 2012. 1 Gartner, The State of Data Integration: Current Practices and Evolving Trends 3

Trusted information: A requirement for modern data warehouses There is no doubt that today s content-based innovations are fueling an unprecedented increase in the demand for access to information. At the same time, they are causing massive growth in data volumes and IT complexity. Traditional data warehouse environments simply cannot keep pace with these changes, as the deluge of big data affects all systems and creates a whole new set of network requirements. To address the common challenges hindering big data and analytics initiatives, organizations need to adopt next-generation data warehouse environments with the scalability, speed and reliability to handle the demands of big data and fast-moving knowledge workers. Requirements for delivering trusted information to a modern data warehouse include: Comprehensive management of growing data: Modern data sources offer valuable new insights because they include a greater variety of information, from video and audio content to streaming data from sensors. Organizations must take a comprehensive approach to managing these massive volumes of data, ensuring that data is of high quality and system performance remains strong. 4

data warehouses 4 Real-time decision making and continuous availability: Timely, accurate decisions are essential to support data-driven initiatives. Business leaders need on-demand access to data they can trust, so data warehouse environments must be able to respond in near-real time. With 24x7 business operations now the norm, data systems have to be continuously available, regardless of location. Management and customers both expect downtime even planned downtime to be minimal, so IT must migrate and upgrade data and applications with as little system disruption as possible. Reduced costs: As many CIOs can attest, IT budgets are not growing nearly enough to keep up with the demand for new applications and services such as business intelligence (BI), analytics and enhanced customer relationship management (CRM) systems. Only by significantly improving operational efficiencies can organizations fund innovation and new projects. Business agility: To compete effectively, organizations need to respond quickly to business opportunities and challenges. This agility depends on rapid decision making based on up-to-date information. Today, managers and employees throughout the organization require business intelligence to do their jobs, and information must be made accessible to an expanding community of users. However, this information has to be kept secure and compliant with industry regulations. Confidence in quality data: The exponential growth in information volume, velocity and variety increasingly threatens an organization s ability to govern data. In some cases, poor-quality, out-ofdate, unreliable data is finding its way into data warehouses, lowering users confidence in the insights gleaned from it. Maintaining information veracity presents a major challenge to information infrastructures. Massive data scalability (MDS): A critical requirement for processing large, enterprise-class data volumes for big data integration, MDS has the ability to dramatically reduce the amount of time it takes to handle various workloads. By optimizing the use of all available hardware resources, organizations can process a maximum amount of data in high-demand situations. 5

next-generation data warehouse environments Manage data growth with a scalable and high-performance platform 7 Achieve greater flexibility with real-time data delivery 9 Integrate disparate data sources into a single virtual view 10 Enhance business agility with quick and streamlined access to information 11 Accelerate time to value with data warehouse blueprints 12 Next-generation data warehouse environments combine big data and data warehouse capabilities with new technologies such as in-memory databases and appliances (see Figure 1). Real-time processing and analytics are essential to leverage the variety of data sources and types available today, including streaming, time-sensitive or unstructured data. Fast, accurate decision making is supported by multiple zones within the environment, including zones for All data Transaction and application data Machine and sensor data Enterprise content Image, geospatial and video data Social data operational data; data landing, exploration and archive; analytics; and data marts and warehouses. Underlying this architecture is information integration and governance (IIG). In next-generation data warehouses, IIG capabilities help deliver information that is worthy of knowledge workers trust, and enable organizations to manage data growth with a scalable, high-performance platform. Data movement in the next-generation warehouse environment Real-time data processing and analytics Operational data zone Landing, exploration and archive data zone Deep analytics data zone Enterprise data warehouse and data mart zone Information integration and governance What action should I take? Decision management What is happening? Discovery and exploration What did I learn? What s best? Cognitive What could happen? Predictive analytics and modeling Why did it happen? Reporting and analytics New/enhanced applications Customer experience New business models Financial performance! Risk Operations, threats and fraud Thirdparty data Systems Security Storage On-premises, cloud, as a service Maximize insight, improve IT economics Figure 1. The IBM architecture for a next-generation data warehouse provides a framework for the capture and analysis of all types of data, including real-time information, and for proactive data privacy, security and governance. 6

Manage data growth with a scalable and high-performance platform 7 Achieve greater flexibility with real-time data delivery 9 Integrate disparate data sources into a single virtual view 10 Enhance business agility with quick and streamlined access to information 11 Accelerate time to value with data warehouse blueprints 12 An agile, next-generation warehouse environment helps organizations capitalize on new analytic capabilities, optimize warehouses for better performance, and deliver innovative self-service capabilities and low-cost access to important insights. The following sections describe the capabilities and enabling technologies required to provide trusted information for a next-generation data warehouse environment. Manage data growth with a scalable and high-performance platform Organizations must manage the increasing variety, volume and velocity of new data pouring into systems from an ever-expanding number of sources. For maximum value, all corporate data must be combined and delivered to end users as reliably and quickly as possible. This challenge requires unprecedented degrees of scalability and performance from an information integration platform. Key requirements include: A data flow architecture that ensures high data throughput A design once, deploy flexibly paradigm and high scalability across a variety of hardware environments such as uniprocessor, symmetric multiprocessor, massively parallel processor and grid computing, at minimal cost Access to databases in parallel and partitioned configurations An extensible framework to incorporate in-house and third-party software It is important to implement a scalable architecture where data integration performance increases linearly with the addition of processing resources (see Figure 2). x y MPP/Grid (Hundreds of TB) SMP/MPP (Hundreds of GB) SMP (GBs) 1x One core 2 way 8 way 64 way 16 way Figure 2. Data scalability across hardware architectures. Hundreds of cores For any number of cores, a scalable solution provides: Same functionality Support for all architectures Linear speedup Decision (at runtime) to scale N way 7

Manage data growth with a scalable and high-performance platform 7 Achieve greater flexibility with real-time data delivery 9 Integrate disparate data sources into a single virtual view 10 Enhance business agility with quick and streamlined access to information 11 Accelerate time to value with data warehouse blueprints 12 Need high scalability and performance? Don t forget these requirements In addition to supporting MDS, data integration efforts must include several other requirements that provide important scalability and performance capabilities. Data integration for the next-generation warehouse must be: Dynamic to meet current and future performance requirements Extendable and partitioned for fast and easy scalability Integrated with Apache Hadoop so you can leverage Hadoop as part of an integration architecture to land and determine the value of data Easy to use with a rich set of transformation and data quality components available out of the box to help accelerate your team s development cycle To learn more about these and other information integration requirements, download the IBM white paper Seven principles for achieving high performance and scalability for information integration. 8

Manage data growth with a scalable and high-performance platform 7 Achieve greater flexibility with real-time data delivery 9 Integrate disparate data sources into a single virtual view 10 Enhance business agility with quick and streamlined access to information 11 Accelerate time to value with data warehouse blueprints 12 Achieve greater flexibility with real-time data delivery To remain flexible and agile in the new world of big data while improving decision-making speed and accuracy, enterprises must capture information at low latency and analyze massive amounts of data in motion. Data replication capabilities such as change data capture (CDC) provide an efficient way to deliver large volumes of data with very low latency. Data replication that incorporates real-time operational and analytical data synchronization enriches mobile applications and big data projects with up-to-the-second information. It also can be used to help ensure continuous availability across the data center or around the world. In heterogeneous environments, data replication enables data distribution and synchronization for transactional systems to support confident, instantaneous decisions at the point of impact (see Figure 3). Alternatively, when deployed in homogeneous environments, data replication supports business continuity and disaster recovery. Application transactions Data replication RDBMS Message queue ETL Hadoop system Stream computing Data warehouse Figure 3. Data replication distributes and synchronizes data to systems that support informed decision making. With dependable data synchronization and availability, organizations can manage their big data growth efficiently and flexibly. Furthermore, up-tothe-minute information that enhances data quality across all warehouses, data marts and point-ofimpact solutions also drives innovative initiatives that help grow the business. 9

Manage data growth with a scalable and high-performance platform 7 Achieve greater flexibility with real-time data delivery 9 Integrate disparate data sources into a single virtual view 10 Enhance business agility with quick and streamlined access to information 11 Accelerate time to value with data warehouse blueprints 12 Integrate disparate data sources into a single virtual view Many information integration technologies simply were not designed to accommodate rapidly expanding and fluctuating data volumes or information-hungry consumers. Continued reliance on inflexible solutions means organizations have to purchase more hardware, more databases and more complex software all of which require more of that shrinking budget. But managing data complexity goes beyond managing data volumes. Data sources and targets are growing along with ways to integrate and access that data, creating complexity for the user. Business and technical users must be able to easily explore, combine and analyze large volumes of diverse information. Data virtualization is an excellent way to simplify data access by isolating the details of storage and retrieval and making the process transparent to data consumers. By simplifying access to data spread across the organization, data virtualization reduces the time required to take advantage of disparate data making it easier for users and processes to access the information they need. These capabilities can not only improve business agility but also greatly reduce the costs associated with lengthy information integration initiatives. 10

Manage data growth with a scalable and high-performance platform 7 Achieve greater flexibility with real-time data delivery 9 Integrate disparate data sources into a single virtual view 10 Enhance business agility with quick and streamlined access to to information 11 Accelerate time to value with data warehouse blueprints 12 Enhance business agility with quick and streamlined access to information A productive and efficient data warehouse environment empowers an expanding community of users by making information accessible, no matter where it resides (see Figure 4). Activity policies and end-user privileges are needed to help ensure governed, pervasive information that inspires confidence. Easy information provisioning the process of moving data into new data stores or seeding a new analytics platform helps simplify data retrieval for business use, increase business agility and reduce costs by shrinking the time required to complete tasks. Working harder Information provisioning empowers nontechnical users with self-service integration capabilities to access the information they need with a few simple clicks that automate a series of best-practice steps to access data from disparate sources. Analysts, data scientists and even line-of-business users can easily retrieve data and populate new systems, freeing technically skilled extract, transform and load (ETL) specialists to focus on more complex activities. This simplified data access also helps organizations accelerate warehouse systems and optimize data usage. Working smarter Start: Data sources Work in multiple interfaces Multiple data handoffs between systems Slow analysis speeds require longer processing windows and more downtime Working with different code languages End: Integrated data Staffing bottlenecks due to manual processes Start: Data sources Single interface for integration tasks Support for multiple data sources and streaming data Predetermined, confirmed set of concepts and logic constructs Automated processes and elements limit hand-coding and replication Data gathered and passed directly to real-time analytics processes End: Integrated data Figure 4. An agile business works smart, using intelligent processes, policies and strategies to streamline access to data. 11

Manage data growth with a scalable and high-performance platform 7 Achieve greater flexibility with real-time data delivery 9 Integrate disparate data sources into a single virtual view 10 Enhance business agility with quick and streamlined access to information 11 Accelerate time to value with data warehouse blueprints 12 Accelerate time to value with data warehouse blueprints In a next-generation information infrastructure, one best practice involves leveraging comprehensive data models or data warehouse blueprints containing data warehouse design models, business terminology models and business analysis templates. These blueprints help organizations to: Make better, faster decisions and respond quickly to business opportunities Build and rationalize data warehouses quickly for improved time to value Provide a consistent data architecture for modeling new or changed requirements Reduce the time required to update existing data warehouses Data models offer a streamlined approach to business transformation that helps reduce risk and enables the successful delivery of high-quality data for applications across the next-generation data warehouse environment. Pre-configured templates enhance reporting and support regulatory compliance mandates. 12

IBM InfoSphere: An optimized foundation for big data and analytics Although the concept of big data and its inherent challenges are relatively recent, IBM has been developing systems to handle massive amounts of data for decades. Today, as part of the IBM Watson Foundations big data and analytics platform, IBM InfoSphere Information Integration and Governance (IIG) solutions offer a unified set of capabilities designed to help organizations: Understand information: Analyze data and its relationships. Share definitions and policies across projects. Despite complexity, govern big data based on business needs. Improve information: Deliver accurate data with consistency across master data entities. Manage information throughout its lifecycle, document its lineage, and secure and protect it. Act on information: Accelerate projects by enabling confidence, adapting quickly to change and making high-value information continuously available. Why IBM InfoSphere? As a critical element of IBM Watson Foundations, the IBM big data and analytics platform, InfoSphere Information Integration and Governance (IIG) provides market-leading functionality to handle the challenges of big data. InfoSphere IIG offers optimal scalability and performance for massive data volumes, agile and right-sized integration and governance for the increasing velocity of data, and support and protection for a wide variety of data types and big data systems. InfoSphere IIG helps make big data and analytics projects successful by giving business users the confidence to act on insight. 13

and analytics 13 InfoSphere IIG capabilities provide the foundational building blocks for a next-generation data warehouse environment focused on maximizing the business value of data (see Figure 5). By delivering trusted and protected information, these capabilities support multiple essential functions, including: Real-time processing Business intelligence Analytics applications Data warehouse appliances Big data exploratory platforms such as Hadoop IBM InfoSphere Information Integration and Governance Information integration Data quality Master data management Data lifecycle management Privacy and security Extract, transform, load (ETL) Extract, load, transform (ELT) Replicate Federate Standardize Validate Verify Enrich Match Master multiple domains Manage registration or transaction hub Collaboratively author Govern master data Archive database Manage test data Monitor activity Mask sensitive data Encrypt sensitive data Redact sensitive data Metadata, business glossary and policy management Enterprise metadata catalog Automated data discovery Enterprise metadata repository Business terminology defined in business glossary Information governance policy definition, sharing and execution Information governance project blueprints Figure 5. IBM InfoSphere solutions deliver a broad set of information integration and governance capabilities that help you unlock the value of information assets to deliver value to the business. 14

Best practices for deriving maximum business value from data As a result of its long history of proven warehouse deployments and customer engagements, IBM has applied industry-leading best practices to its governance capabilities. These capabilities help organizations increase the value of data for information-intensive projects such as big data and analytics, application consolidation and retirement, security and compliance, 360-degree views of customer and product data, and many others. InfoSphere IIG helps integrate big data into the enterprise, ensuring that it is both trusted and protected. Pre-integrated and optimized data helps lower the total cost of ownership, accelerate deployment and time to value, and increase business stakeholders trust and confidence in the information. Once the organization has confidence in its big data, decision makers can formulate well-founded decisions and take action to accelerate informationintensive projects, drive innovation and stay ahead of the competition. 15

Resources To learn more about the IBM approach to information integration and governance for data warehousing, big data and analytics initiatives, please contact your IBM representative or IBM Business Partner, or check out these resources: White paper: Who s afraid of the big (data) bad wolf? White paper: Creating a trusted data warehouse using IBM InfoSphere Information Server Video: IBM InfoSphere Information Server for Data Warehousing White paper: Integrating and governing big data E-book: Understanding big data so you can act with confidence 16

Copyright IBM Corporation 2014 IBM Corporation Software Group Route 100 Somers, NY 10589 Produced in the United States of America June 2014 IBM, the IBM logo, ibm.com, IBM Watson, and InfoSphere are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation. 1 Thoo, Eric; Friedman, Ted; Beyer, Mark A. Gartner. The State of Data Integration: Current Practices and Evolving Trends. April 3, 2014. G00260986. Please Recycle IMM14156-USEN-00