IBM Analytics Just the facts: Four critical concepts for planning the logical data warehouse
1 2 3 4 5 6 Introduction Complexity Speed is businessfriendly Cost reduction is crucial Analytics: The key to current and future success Summary: Delivering the value of big data
Introduction: Is your warehouse holding you back? Perhaps you ve discovered that your firstgeneration warehouse is constraining your business. Or you ve already modernized your infrastructure but found the new warehouse inefficient. For example, some solutions deliver speedy analytics only with constant tuning and administration. Others solutions perpetuate the costs and complexity that you want to leave behind. Either way, it s time to rethink your data warehouse approach. The term logical data warehouse has been used to describe an approach that addresses some of these problems. The logical data warehouse breaks the function of the traditional data warehouse into logical blocks, with each block chosen for the particular characteristics of the data it is working with or the analytics required. While any number of reasons can prompt a change in data warehouse solutions, there are four key facts you need to know to help you make the right choice: 1. Complexity 2. Speed is 3. Cost reduction is crucial 4. Analytics: The key to current and future success This e-book will explore those facts and explain why they are essential when evaluating your data warehouse options. 3 1 Introduction 2 Complexity
Complexity Successful technologies make the difficult simple. Less successful technologies are unable to contain complexity, making the business lives of administrators and users harder. When warehouse infrastructure becomes more complex, the technical team spends most of its time managing and not innovating. Rather than working with their business peers to create value from data, they are consumed with database and storage administration. Although the warehouse contains valuable information, the technology team is left with too little time to engage and champion data-driven decision making within the organization s business functions. What s the answer? Warehouse designers and engineers must make simplicity and performance their primary goals, testing systems at each stage of development with the question: Can we make this simpler for end users? Look for systems designed to contain complexity, such as integrated appliances that help shorten setup time, eliminate unnecessary tuning and automate routine tasks. Integrated systems can help streamline analytics by consolidating all analytic activity in one place, where the data resides. Data scientists can build their models using all the enterprise data, and then iterate through different models much faster to arrive at the best solution. Once the model is developed, it can be seamlessly executed against the relevant data in the appliance. Users can get their predictive scores in near-real time, while the integrated infrastructure makes analytics available throughout the enterprise. 4
Accelerate and simplify analytics with appliances Big data generates complexity within traditional data warehouse architectures. Organizations need flexible, efficient technology strategies for manipulating data and developing applications, products and services faster. New technology innovations address these needs by consolidating functionality such as inmemory analytics, Apache Hadoop and cloud into a single, purpose-built, easy-to-manage system. This design evolution is guided by three core tenets: 1 2 Consolidate infrastructure to simplify analytics: Appliances and specialized systems reduce complexity by consolidating sprawling data marts into a small number of workload-optimized systems. Process workloads on fit-for-purpose platforms: Computation is mapped to appliances and systems specifically designed for well-understood workloads. These specialized systems offer optimal performance at affordable prices, while their simplicity accelerates time to value. 3 Coordinate system management and data governance across the enterprise: Centralize data management, not data and compute resources, to make data warehouse administration easy and affordable. By consolidating a sprawl of ungovernable data marts into far fewer purpose-built analytic appliances, IT teams can deliver the best price-performance for analytical queries, while streamlining administrative effort. This frees valuable technical staff to develop and deploy new business intelligence and analytics applications 5
Speed is Bringing data under management is just the first step in realizing business value. Across all sectors, industry leaders expect to analyze developments as they occur and then respond in near-real time. For example, healthcare professionals understand that patients benefit when analyses of big data sets are completed at high speed, enabling clinicians to take samples, run diagnostic tests, report results and provide advice, all within a single clinic appointment. The traditional data warehouse was designed to store and analyze historical information on the assumption that data would be captured now and analyzed later. It simply was not architected to support nearreal-time transactions or event processing. Yet the velocity of data being captured, processed and used is increasing. In fast-moving analytics markets, time to value has significant cost implications. Delays in bringing analytical applications into production can cost you significant revenue and profit opportunities. Within the telecommunications sector, for example, speed is the connection between effective data management and excelling at customer service. Manufacturers need to rapidly uncover defects in processes and products before they reverberate in the marketplace. Without the ability to manage and use data at the speed of business, organizations in all industries cannot respond to market opportunities in a timely way. 6
Providing real-time analysis of massive amounts of data requires modern data warehouse platforms. This does not necessarily mean existing enterprise data warehouses must be replaced, but they do need to be enhanced and extended. Look for next-generation data warehouse platforms with: Hardware and software specifically designed, integrated and tuned for high-performance analytics Real-time analysis that operates on data in motion, allowing you to understand data and events as they unfold Ability to perform analytics in-place without needing to move data, which slows down the process Integrated functionality such as Hadoop, in-memory or columnar technology to help accelerate analytic queries and boost performance We re getting deeper into the data in multiple ways... When we see new commonalities in treatments for children, we can design new protocols to provide the best possible care. Wendy Soethe Enterprise Data Warehouse Manager Seattle Children s Hospital 7
Cost reduction is crucial While making older technologies satisfy some new demands may be possible, the results are often inefficient and burdened with unnecessary costs. First-generation warehouses and appliances built on general-purpose database systems need constant care and feeding from teams of administrators. Many solutions require that multiple secondary data structures such as indexes and aggregates be designed, coded, implemented and tested on individual tables. These outdated information management burdens saddle organizations with long, costly duration implementation cycles. Older database systems can also create a feeling of being locked in, raising concerns that the costs of moving to a new technology may override the benefits. Look for solutions that make migrating to a new warehouse quick, easy and inexpensive. Weeks or months of tuning and load-testing a new processing node diminishes the value of a distributed system, where agility and adaptability should be primary benefits. Weeks or months of tuning and load-testing a new processing node diminishes the value of a distributed system. Linear scalability is essential to make adding a logical warehouse node simple and costeffective. This means organizations can pick the appropriately sized appliance to meet both their data volume and performance requirements, all with predictable, scalable performance and no need to add significant additional resources to manage the appliance as data volumes grow. Staff training is expensive, so choose a platform that shields administrators from complicated data management and enables business users to enjoy immediate access to their data as soon as the new system is installed. Reducing data management costs opens up additional resources to invest in the value-creating activities of advanced analytics. 8
Analytics: The key to The purpose behind data warehousing has always been to enable business analysis, bringing deeper understanding and new opportunities. Today, customers use of the web and smartphones is creating massive new data sets, many of them unstructured or semi-structured, that must be managed and analyzed along with existing enterprise data. As customers interact through their smartphones, new opportunities arise for engagement through marketing and customer support. These opportunities include ingesting newly created customer data in near-real time and analyzing it immediately in context of historic data to push a personalized response to a customer s smartphone. Seizing the opportunity in big data and analytics requires envisioning the future, moving analysis to the center of the business and proactively planning rather than passively reacting. I need some way to understand what they re thinking, what they re feeling, without having to have contact with them. PureData for Analytics is what s going to help us understand what the customers want when they walk into my stores. Paula Post Vice President Merchandising Optimization The Bon-Ton Stores 9
First-generation operational warehouses were not designed to manage data at today s volume and variety; their query performance is never fast enough. They typically analyze only subsets of available data and provide a historical perspective that can be applied to future decisions. Data warehouse workloads are different, typically reading extremely large data sets and then analyzing them to uncover threats and opportunities to find the needle in the haystack. Ideally, you need a solution that is optimized for analytics and is capable of: Keeping data up to date Making data instantly available for analysis Look for a setup that delivers fast query performance on analytic workloads, supports a flexible range of data warehouse users and provides sophisticated analytics to satisfy business intelligence requirements. You want a scalable, hardware-accelerated, massively parallel system that lets you gain insight from enormous data volumes without copying the data into a separate analytics server. Managing big data volumes while yielding valuable insights 10
Summary: Delivering the As technology allows IT to more effectively contribute to an organization, businesses need to quickly generate insight from information to accelerate informed decision making and address user demands for mobility and self-service. Meeting these challenges requires solutions capable of delivering a unique combination of speed, simplicity and efficiency, along with the ability to seamlessly integrate with other information sources to realize the value inherent in big data analytics. Data warehouse appliance + Built-in in-database analytic capability, advanced security and integration with third-party tools Business intelligence IBM Cognos Business Intelligence Data integration and transformation IBM InfoSphere DataStage and InfoSphere Data Click Hadoop data services IBM BigInsights for Apache Hadoop Exceptional value Real-time analytics IBM InfoSphere Streams Developer Edition Powering the logical data warehouse IBM Fluid Query Move to a modern approach Companies should align their data warehouse platform choices with their plans for business growth and expansion. This requires an approach that looks beyond traditional warehouses and appliances built Figure 1. Unlocking data s potential: IBM PureData System for Analytics N3001 on general-purpose database systems. IBM PureData System for Analytics N3001 the next generation of the IBM PureData System for Analytics family of appliances is designed with these facts in mind. The high-performance, massively parallel system enables organizations to gain insight from their data and perform analytics on enormous data volumes (see Figure 1). 11
To facilitate this insight, the IBM Fluid Query capability unifies data access across the logical data warehouse by providing access to data in Hadoop from PureData System for Analytics appliances. Fluid Query 1.0 enables the fast movement of data between common Hadoop systems and PureData System for Analytics appliances. Powered by IBM Netezza technology, the PureData System for Analytics N3001 delivers: A purpose-built design that accommodates standards-based data and architecturally integrates database, server, storage and advanced analytics capabilities into a single, easyto-manage system Hardware with an accelerated massively parallel-processing design that is specifically optimized for running complex analytics on large data volumes at high speeds The proven performance, scalability, intelligence and simplicity to help organizations dive deep into their data The PureData System for Analytics family includes advanced security and models ranging from Mini-appliances to 8-rack systems. The appliances are designed to deliver the proven performance, value and simplicity organizations need to extract insights hidden in their massive amounts of data. The N3001 model comes ready to deliver extra value with software entitlements to business intelligence and Hadoop starter kits. It requires minimal ongoing administration or tuning, and offers immediate data loading and query execution following installation. The performance of PureData is very good; most reports we have are running in less than 5 seconds whereas with other databases we had reports running for 10 20 minutes. Philippe Chartier BI Team Lead, Information Delivery Canadian National Railway Company 12
For more information To learn more about IBM PureData System for Analytics, check out the following resources: White paper: Simple is Still Better Embrace Speed & Simplicity for a Competitive Edge Watch live and recorded Virtual Enzee webinars on demand @IBMNetezza @IBMDataWH #Enzee PureData-Enzee Community www.enzeecommunity.com 13
Copyright IBM Corporation 2015 IBM Analytics Route 100 Somers, NY 10589 Produced in the United States of America June 2015 IBM, the IBM logo, ibm.com, BigInsights, Cognos, DataStage, InfoSphere, and PureData are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Netezza is a trademark or registered trademark of IBM International Group B.V., an IBM Company. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Please Recycle WAM12354-USEN-00