Smart Consolidation for Smarter Warehousing. A Key IBM Strategy for Data Warehousing and Analytics

Size: px
Start display at page:

Download "Smart Consolidation for Smarter Warehousing. A Key IBM Strategy for Data Warehousing and Analytics"

Transcription

1 Smart Consolidation for Smarter Warehousing A Key IBM Strategy for Warehousing and Analytics

2 Table of Contents Overview...go to 3 Introduction...go to 4 Executive Summary: Smart Consolidation in a Nutshell...go to 4 The Centralized Enterprise Warehouse...go to 5 Rethinking Warehousing and Analytics: The Logical Warehouse...go to 8 Smart Consolidation s Guiding Principles... go to 8 Foundational Technology Requirements... go to 9 Next Steps... go to 9 Tenet One: Consolidate Infrastructure to Simplify Analytics...go to 10 Technology Requirement... go to 11 Technology Proof Points... go to 11 Use Cases... go to 12 Tenet Two: Process Workloads on Fit for Purpose Nodes...go to 13 Technology Requirements... go to 15 Technology Proof Points... go to 15 Use Cases... go to 16 Tenet Three: Coordinate Management and Across the Logical Warehouse...go to 17 Technology Requirements... go to 18 Technology Proof Points... go to 18 Use Case... go to 19 Technology Requirements...go to 20 Workload-Optimized Systems and Appliances... go to 20 Seamless Flow... go to 20 Virtualization and Query Redirection... go to 21 Sophisticated Enterprise Management Tools... go to 22 Performance... go to 22 The Components of a Logical Warehouse...go to 23 Smart Consolidation Entry Points...go to 27 Conclusion...go to 28 Written by: George Davies Jr., IBM Netezza Strategic Marketing, IBM Software Group, Information Management

3 Overview High-Performance Analytics and the Logical Warehouse Business intelligence is built on data warehouses. These systems commonly integrate data from internal data sources, which are often transactional systems that record an organization s interaction with customers, prospective customers, suppliers, business partners, competitors and regulators. An enterprise data warehouse is commonly implemented as a centralized system: a single data-management technology running on a single computer system. This approach leads to very large databases the warehouse is commonly an organization s largest database. warehouses are usually built on database management systems that were originally designed for processing online transactions, not online or offline analyses of data generated by web applications and social media. Online transaction processing systems typically manage much smaller data sets than those used in analytic processing, which suggests that a database management system optimized for processing online transactions is not always the best choice for analyzing data. The quest for information and competitive advantage extends business intelligence beyond reports and dashboards to analytic applications that deliver predictions and insight to guide decision making. Simple reporting and advanced analytic applications create very different workloads, the former fulfilled by relatively straightforward read operations expressed in SQL, while analytic applications demand heavy computation from languages such as C, C++, Java and others. New data sources (such as , log files, sensor data, and user-generated content on social media websites) and new data types (such as images and audio), unconsidered at the inception of data warehouses in the 1990s, are now driving growth opportunities for business intelligence and analytics. 3

4 Introduction At its BI Summit on May 2, 2011, Gartner observed that the traditional enterprise data warehouse vision has, in general, not been achieved. These industry analysts refer instead to a logical data warehouse. In June, IBM announced a strategy, Smart Consolidation for Smarter Warehousing, that adopts Gartner s terminology and recommends an evolutionary change in direction from a single, centralized physical system to a distributed architecture, where computation is provided by individual systems, with each node optimized for specific workloads. In this paper, we elaborate on the Smart Consolidation strategy, emphasizing fresh use cases and proof points along the way. Executive Summary: Smart Consolidation in a Nutshell As organizations grow their business intelligence portfolios, deploying analytic applications to derive greater value from their expanding data stores, their data warehouse systems assume ever more importance. These systems, originally conceived to support offline reporting and rudimentary analytic queries, are now expected to support enormous and growing data volumes, new unstructured and semi-structured data sources, and expanding communities of knowledge workers running a wide range of analytic workloads. For many organizations, the traditional model of a single, centralized enterprise data warehouse (EDW) system has become too rigid, unable to keep pace with modern demands for mixed workloads, extreme performance, and advanced analytic applications. This paper describes an evolutionary strategy, a smarter vision for harvesting business advantage from large and varied data stores with advanced analytics: Smart Consolidation and the logical data warehouse. Smart Consolidation s Guiding Principles 1. Consolidate infrastructure to simplify analytics. 2. Process workloads on fit for purpose platforms. 3. Coordinate system management and data governance across the enterprise. Smart Consolidation Technology Requirements Workload-optimized systems and appliances Seamless data flow virtualization and query redirection Sophisticated enterprise data management tools Performance 4

5 A Sample Road Map to Smart Consolidation Consolidate sprawling data marts by offloading analytics workloads from the EDW to workload-optimized systems. Introduce queryable archiving to provide cost-effective analytics on massive data sets. Accommodate new particularly Big sources into the analytic infrastructure. Consolidate enterprise data management and logical warehouse management. Before expanding on these topics, let s review the case for change. The Centralized Enterprise Warehouse Many large enterprises have adopted centralized enterprise data warehouses as their analytic infrastructure. While this model served reasonably well into the mid 2000s, the growth in data volume, variety, and complexity has combined with the exploding demand for analytics to severely limit the EDW s utility as an enterprise solution. Figure 1: The EDW as Originally Envisioned Vision: All enterprise data storage, analytic and operational processing takes place in one central data warehouse. Reality: Many single EDWs cannot handle today s volume, velocity and variety, of data and workloads. Lack of agility, increasing latency. Business needs are not being met. Sources CRM Integration Traditional Centralized Enterprise Warehouse ERP External Sources 5

6 While this depiction of a comparatively bleak reality does not describe all installations many organizations have performed quite nicely with a monolithic warehouse in place the accelerating future of business analytics requirements, and their effects on the single-store vision, is not in doubt. In 2010, Gartner estimated that over 70% of EDWs had performance issues ( Warehouse Magic Quadrant, 2010). In a November 2010 global database survey, Forrester reported that 65% of enterprises found it difficult to deliver performance with their existing architectures. With these levels of performance dissatisfaction as a starting point, adding complex, high-volume unstructured big data handling, while satisfying the growing demand for still more sophisticated analytics, would appear to deliver an insurmountable challenge to the centralized EDW model. Note: It is worth emphasizing that if your workloads and data stores can indeed be satisfied by a single, high-performance warehouse system, you have already achieved Smart Consolidation. That is, some organizations will find that they can satisfy all of Smart Consolidation s guiding principles on a single system. As data volumes, variety, and complexity continue to grow, and analytic workloads multiply, a single computer system, running a single database management system, may begin failing to meet expected service levels. Workload performance declines. When the centralized data warehouse does not deliver, line-of-business users take predictable action. The typical first response to long-running workloads and underperforming queries is to tune and partition the system. These actions may work for a time for selected workloads but inevitably, they draw valuable technical staff into an endless cycle of warehouse care and feeding. It is hard to overstate the negative business impact of diverting highly skilled technical resources away from business-driving innovation and applying them instead to ineffective system maintenance. When tuning fails to bring EDW performance in line, frustrated user communities react quite logically: they begin to extract data subsets and move them to secondary systems or data marts. But attempting to solve one problem creates multiple new ones: data silos limit enterprise-wide analytics, creating blind spots; governance becomes impossible; data extract-and-offload operations create additional load on the already teetering EDW; and costs and complexity escalate. 6

7 Figure 2: The Return of Sprawl A single, centralized EDW is simply unable to handle today s volume and variety of data. Lines of business resort to ad hoc solutions, creating data mart sprawl resulting in: Limitations to enterprise-wide analytics and visibility; A lack of true governance; Increased strain on the EDW, shortening its lifespan; An inability to scale; and Escalating cost and complexity. Ultimately, the complexity and cost of a single EDW outweigh the business benefits. Sources CRM Integration Traditional Centralized Enterprise Warehouse ERP External Sources The spreadmart warehouse topology depicted above is too complex to administer, too reliant on tuning, too inefficient at analytics, and too costly to maintain. governance is impossible, analytic performance is unreliable, and analytic innovation is effectively derailed. In short, the costs of this installation have come to outweigh its value to the business. Unfortunately, at many sites, with large IT investments in play, this realization does not come quickly or easily. The sprawl-makers ad hoc efforts at distributed processing are not entirely mistaken, but while a distributed approach does have merit, enterprise-level success requires some important architectural adjustments and investments in new software tools. We call the required tools and adjustments Smart Consolidation, and the resulting infrastructure a logical data warehouse (LDW). 7

8 Figure 3: Evolving to a Logical Warehouse Key Tenets 1. Consolidate infrastructure to simplify analytics. 2. Process workloads on fit for purpose platforms. 3. Coordinate system management and data governance across the enterprise. Traditional Centralized Enterprise Warehouse Evolution Logical Enterprise Warehouse A very nice picture, but how do we get there? Rethinking Warehousing and Analytics: The Logical Warehouse Smart Consolidation s Guiding Principles This paper suggests a way forward from a single physical system to a logical data warehouse, implemented as a distributed-computing infrastructure integrated by software utilities. This evolution is guided by three architectural tenets, which illuminate the evolutionary pathway, and suggest a roadmap for stepwise action: 1. Consolidate infrastructure to simplify analytics. Appliances and specialized systems reduce complexity by consolidating sprawling data marts into a small number of workload-optimized systems. 2. Process workloads on fit for purpose platforms. Computation is mapped to appliances and systems specifically designed for well understood workloads. These specialized systems offer optimal performance at affordable prices, their simplicity 8

9 accelerates time-to-value, and their deployment frees the EDW to assume a more focused role as orchestration engine and data management hub. 3. Coordinate system management and data governance across the enterprise. Centralize data management, not data and compute resources. IBM s industry-leading software portfolio of data management, governance, replication, and integration tools makes logical data warehouse management easy and affordable. Foundational Technology Requirements Each of these guiding principles implies several additional design principles, or technology requirements: Workload-Optimized Systems and Appliances Consolidated infrastructure and distributed data/compute nodes demand high-performance, cost-effective processing platforms to handle the assigned workloads. Seamless Flow flows smoothly between warehouse nodes. Synchronization operations replication, change data capture (CDC) updates, etc. are integrated, automated, and reliable. Virtualization and Query Redirection Transparent distribution of systems and data. Warehouse topology is invisible to business users and applications. virtualization and automated query redirection hide system complexity, and data is accessed through a discrete set of well-defined access points browsers, application clients, and APIs. Sophisticated Enterprise Management Tools integration, data governance, and their related disciplines and sub-disciplines (master data management, changed data capture, data quality, data cleansing, and so on) require (a) an orchestration platform, and (b) enterprise-level applications with cross-system visibility. Performance Everything has to be fast. Next Steps Later, we will describe a variety of entry points, or adoption strategies, for Smart Consolidation. Here is just one practical pathway, and the step order is not required: Step: Consolidate sprawling data marts, and offload analytic workloads from the EDW to workload optimized systems. The EDW is now the central locus for data governance and metadata management. Use simple, effective tools for data flow planning, data movement, and governance. This step delivers high performance analytics and simplifies infrastructure and its management. 9

10 Step: Introduce queryable archiving to provide cost-effective analytics on massive data sets at an economical price point. IBM Netezza has announced the C1000 family of High Capacity Appliances, which minimize the per-terabyte cost to store large historical data sets, while keeping the data accessible for on-demand user queries. Step: Accommodate new particularly Big sources into the analytic infrastructure, using systems with real-time streaming analytic engines and Hadoop platforms to undertake pre-processing and initial analysis, and to ingest data into the logical warehouse. can then flow from those platforms to other analytics appliances and systems for further downstream processing. Step: Consolidate enterprise data management and logical warehouse management, incorporating data integration, cleansing, governance, metadata management, and distribution of data flows to the appropriate analytical platforms. (This longer-term step is actually a series of steps that, while highly desirable, can be viewed as an ideal end state rather than a strict requirement.) The remainder of this paper expands on Smart Consolidation s guiding principles, technology requirements, processing nodes, and adoption strategies. Tenet One: Consolidate Infrastructure to Simplify Analytics Because unsatisfied analytic demands drive the rise and proliferation of data marts, consolidating this portion of your architecture may provide the quickest, highest value for business users. Appliances and specialized systems reduce complexity by consolidating sprawling marts to a small number of workload-optimized systems. That is, offload analytics from the EDW to an environment optimized for analytics the data warehouse appliance. Figure 4: Evolving to a Logical Warehouse: Consolidate Infrastructure Smart Consolidation is an evolutionary strategy, not a disruptive one, particularly for clients who have already built EDW-centric architectures. Consolidate infrastructure with purpose-built appliances and systems. Reduce data mart sprawl. Offload analytics from the EDW to appliances optimized for performance. Achieve true data governance. Reduce stress on EDW. Lower total cost of ownership. Simplify queries and analytics against historical data. (figure continued on following page) 10

11 Sources CRM Integration Traditional Centralized Enterprise Warehouse ERP External Sources BI + Ad Hoc Analytics Operational Analytics Queryable Archive Governance, Security + Lifecycle Management By consolidating a sprawl of ungovernable data marts into far fewer purpose-built analytic appliances, IT teams can deliver the best price-performance for analytical queries, while streamlining administrative effort. This frees valuable technical staff to develop and deploy new BI and analytics applications. Technology Requirement Cost-effective, high-performance, workload-optimized systems and appliances Technology Proof Points IBM Netezza Warehouse Appliances IBM Netezza data warehouse appliances are well suited to the Smart Consolidation model. Each appliance is purpose-built for advanced analytics the ultimate workload optimized system. IBM Smart Analytics System for Operational Intelligence IBM Smart Analytics Systems are designed for warehouses that support mixed workloads, including analytics and operational decision support. Starting at under $50K including integrated Cognos reporting Smart Analytics System models are available on Power Systems, System z, or System x. 11

12 IBM DB2 Analytics Accelerator (IDAA) The System z mainframe accelerator comprises one or more IBM Netezza data warehouse appliances, to which (a) selected data are synchronized, and (b) deep analytic queries are redirected automatically, and invisibly to end users. Users have no direct interaction with the IBM Netezza node(s). synchronization and query redirection are fully automated. Installation and deployment are quick and easy, requiring no professional services. No other vendor can bring the same level of directed hardware innovation or the same software connectivity portfolio to its logical warehouse vision. Use Cases Banking A major U.S. bank deployed the IBM Smart Analytics System for credit risk analysis, operational BI, custom reporting, and data cleansing and management cost-effective, high-performance support for multiple workloads. Healthcare A large healthcare alliance must capture, integrate, manage, and analyze diverse high-volume data sources clinical, financial, and operational and share the resulting data stores and analysis with clients, partners, and practitioners. The customer uses a variety of IBM software and systems. Key components include IBM Smart Analytics and DB2 Warehouse Edition software for data integration, consolidation, and operational BI, and IBM Netezza data warehouse appliances for complex analytic workloads an excellent example of consolidating workloads on fit for purpose processing nodes. Financial Services A financial institution must calculate value-at-risk for an equity options desk. The IBM Netezza platform was able to run a Monte Carlo simulation on 200,000 positions with 1,000 underlying stocks (2.5 billion simulations) in under three minutes. Leveraging an in-database analytics approach allowed the financial institution to analyze the data where it resides, rather than build a parallel data-processing platform to run the simulation. Faster query response time and eliminating the time required to move data between two platforms allowed the company to add variables to investment strategy simulations, and to run the risk analysis more frequently. Yale/FINRA Limit Rules Backtesting Yale researchers evaluated an IBM Netezza data warehouse appliance against cloud-based (Amazon EC2) data storage while analyzing 24 billion historical stock transactions. Bringing computation to the data with IBM Netezza In-base Analytics yielded a 43% performance gain over the cloud-based solution with no system tuning. 12

13 Tenet Two: Process Workloads on Fit for Purpose Nodes and compute resources are assigned to appliances and other systems specifically designed for well-understood workloads. These specialized systems offer optimal performance at affordable prices, their simplicity accelerates time-to-value, and their deployment frees the EDW to assume a more focused role as orchestration engine and data management hub. Unburdened of the analytic processing for which it was not designed, the central data warehouse regains computational resources that can be focused on operational and orchestration activities, including data integration and data quality oversight. Eventually, this shift sees the enterprise data warehouse evolving into a new role as the enterprise data hub, mediating data flow, coordinating data integration, and distributing data to the appropriate analytics engines with a simple, appliance-based approach. Centralizing data management reduces complexity and costs and simplifies the pursuit of rigorous data governance. Further opportunities to offload data management and computation include an appliance for operational analytics; an appliance as a queryable archive; a stream-processing system for real-time analysis of data on-the-wire (for example, from digital sensors or an network); and a grid running the Hadoop Distributed File System for analyses of big data such as web-click streams and call-detail records. The logical data warehouse is a dynamic system: nodes may come and go, and inter-node connections permit the results of big data analyses to be moved downstream to analytic appliances for deeper, more advanced analytic processing. Obvious first steps include offloading existing analytic workloads, absorbing rogue data marts back into the logical warehouse, and deploying a queryable archive. At many sites, however, there is a new strategic priority: Move to accommodate new (predominantly Big ) sources into the analytic infrastructure, using systems with real-time streaming analytic engines, time series data processing, and Hadoop platforms to undertake preprocessing and initial analysis, and to ingest data into the logical warehouse. can then flow from those platforms to other analytics appliances and systems for further downstream processing. from new sources may, by design, pass initially through very different data integration and governance filters. Once data flows beyond the ingestion platform, the EDW applies governance, data flow, and life cycle rules. As new data paradigms and analytic platforms emerge, they, too, can be similarly integrated. 13

14 Figure 5: Evolving to a Logical Warehouse: Distribute and Computation Distribute data and compute to the LDW node that best meets the requirements of the application or workload price-performance, availability, data sensitivity, etc. Agile architecture for introducing new data types and analytic models: Offload data and analytics from the EDW to workload-optimized nodes Extend the data warehouse by adding big data and real-time analytic processing Add analytics for new data types New Sources Sensor + Meter Big Processing Real-time Analytics Time Series Processing Event Internet/Social Media Integration Traditional Centralized Enterprise Warehouse Traditional Sources CRM, ERP, External Sources BI + Ad Hoc Analytics Operational Analytics Queryable Archive Governance, Security + Lifecycle Management 14

15 First, identify workloads that can be isolated and offloaded cleanly analytic applications against discrete data sets make good candidates, as do the Big workloads mentioned above. Whether offloaded from the central warehouse (structured data analytics) or deployed on purpose-built nodes before ever having been deployed on the central warehouse (Big and queryable archive workloads), the key benefit is that these workloads can now run on more efficient, less expensive platforms than the subset of EDW resources that would be required to serve these workloads. Technology Requirements Cost-effective workload-optimized systems and appliances Seamless data flow virtualization and query redirection Enterprise-level orchestration engine Technology Proof Points IBM DB2 Analytics Accelerator The IBM Netezza accelerator for System z demonstrates automated inter-system data synchronization and query redirection. Advanced analytic workloads are offloaded from the mainframe to a fast, efficient appliance purpose-built for analytics. IBM InfoSphere BigInsights IBM developed InfoSphere BigInsights to analyze unstructured data such as text, video, audio, and social media. The software, developed in part by IBM Research, is based on Hadoop and more than 50 IBM patents. IBM is also rolling out 20 new analytic services, including: - Server and storage optimization tools for faster implementation time. - center life cycle cost analysis to cut expenses and bolster sustainability efforts. - Security analytics to automate handling of critical events. IBM has also developed the Jaql query language and released it to the open source community. Jaql is a high level declarative language for processing both structured and nontraditional data. Its SQL-like interface means quick ramp-up for developers familiar with SQL and also makes it easier to operate on relational databases. Jaql is highly extensible, and IBM has used this capability to include pre-built Jaql modules in BigInsights that enable integration with IBM Netezza data warehouse appliances and text analytics systems such as IBM Content Analytics. IBM InfoSphere Streams InfoSphere Streams is designed for real-time capture and analysis of unstructured data streams such as Tweets, video, sensor data, stock market data, blog posts, video frames, EKGs, and GPS data. Software advancements have improved previous-generation performance by roughly 350%. Note that BigInsights complements Streams by applying analytics to massive historical data at some time after the first-level, real-time analysis performed by Streams. 15

16 IBM InfoSphere Warehouse The DB2-powered data warehouse software at the heart of the IBM Smart Analytics System is designed for warehouses that support mixed workloads, including analytics and operational decision support. It is available across a broad set of operating systems and hardware platforms to support a wide range of custom solutions. IBM Netezza High Capacity Appliance Introduce queryable archiving to provide cost-effective analytics on massive data sets at an economical price point. Use Cases Insurance A global insurance company uses DB2 Analytics Accelerator to offload claims analysis queries from DB2 for z/os to a tightly integrated IBM Netezza-based accelerator system. In addition to accelerating complex analytic query speeds, this transparent solution also reduces load and improves performance on the System z mainframe. Banking A large international bank uses InfoSphere Streams to capture Internet and ATM data streams, while a high-performance central warehouse stores consolidated, integrated data for deep analytics to drive targeted marketing efforts and derive new insights from customer data, behavior, and credit card transactions. BlueKai and Intuit: Hadoop and IBM Netezza Stream tens of billions of clickstream data points to a Hadoop grid to perform first-level analysis. Then forward selected data subset(s) to an IBM Netezza data warehouse appliance and run advanced analytics to decipher consumer behavior for ad targeting and market campaign optimizations. Department of Energy, Pacific Northwest National Laboratory InfoSphere Streams grid captures high-speed, high-volume log streams and then discards uninteresting status messages before forwarding exceptions and other outliers to IBM Netezza for advanced analytics. BigInsights and IBM Netezza A large retailer performs web log analysis with BigInsights for site optimization, followed by segmentation analysis on per-customer clickstreams. As expected, using Streams and/or BigInsights to process extreme data, and then uploading filtered data or data subsets to IBM Netezza data warehouse appliances and IBM Smart Analytics Systems for advanced analytics, is rapidly becoming a common use case. Note that stream-captured and other big data can also be supplemented or enriched before it is forwarded to an advanced analytics appliance. For example, by adding information to stock market ticks, such as industry class or interested trader lists, even more sophisticated analysis and fine-tuned alerting can be achieved. Furthermore, in real time, Streams can help determine which bits of the incoming data stream should be routed to longer-term database storage and subsequent analysis. As a result, gains in analytic insight are compounded by reduced storage and administration costs. 16

17 Tenet Three: Coordinate Management and Across the Logical Warehouse Crucially, this data warehouse requirement to serve as a data management control point does not fall away as we evolve from traditional EDW to logical data warehouse. Fortunately, IBM s exceptional portfolio of software systems for enterprise management and data governance makes logical data warehouse management simple and cost-effective. These tools and policies manage metadata, provide data governance, distribute/replicate data, and manage the data life cycle, and they do so across heterogeneous systems, including those from multiple vendors. Figure 6: Evolving to a Logical Warehouse: Coordinated Management and Enterprise Smart Consolidation allows clients to evolve from a complex monolithic architecture to a more agile logical and decentralized architecture. Computation is managed centrally but it is executed on LDW nodes optimized for specific workloads, which improves performance, governance, scalability, and agility. New Sources Sensor + Meter Big Processing Real-time Analytics Time Series Processing Event Enterprise Internet/Social Media Traditional Sources CRM, ERP, External Sources BI + Ad Hoc Analytics Operational Analytics Queryable Archive Governance, Security + Lifecycle Management 17

18 Rather than shackling data in a central repository in the name of governance, we want to centralize the management not the distribution and processing of enterprise data. The goal is to simplify the administration, provisioning, and scalability of the extended analytics infrastructure by centralizing those functions. Move data management into the hub, and move analytics out to workload-optimized nodes. Technology Requirements Seamless data flow Enterprise-level orchestration engine Sophisticated enterprise data management tools Technology Proof Points IBM InfoSphere Change Capture (CDC) InfoSphere CDC offers cross-system change data replication for focused data synchronization between disparate systems, and the supported platform list continues to grow. IBM Netezza Replication Module, IBM InfoSphere Replication Server More key pieces of the enterprise data puzzle. Reliable, full-scale replication options are required to meet high availability and disaster recovery standards, and also to enable queryable archiving, for which demand is rising dramatically. Stage-BigInsights Integration This combination streamlines bulk data integration between HDFS and a data warehouse system. Users can launch a BigInsights analytics job from the Stage user console. IBM InfoSphere Blueprint Director Blueprint Director is positioned to take a larger role in logical data warehouse configuration. 18

19 Figure 7: A Conceptual View of Proposed Logical Warehouse Configuration Tools Blueprint Director Warehouse Replication Rules Analytic Analytics IBM InfoSphere Warehouse IBM Change Capture IBM Netezza 1000 IBM Cognos Business Intelligence Optim Management and IBM Netezza High Capacity Appliance Integration Another of IBM s enterprise-level software systems, optimized to manage the new IBM Netezza C1000 series of High Capacity Appliances. Use Case InfoSphere CDC A large financial institution synchronizes several million transactions per day between internal financial data systems and customer-facing query and transactional systems, in near real time. InfoSphere CDC performs real-time data integration automatically, with high availability and single-console setup and management. 19

20 Technology Requirements Previous sections referred to some core technology requirements associated with the Smart Consolidation model s three main tenets. Here, we ll describe these requirements in a little more detail. They provide a good basis for tracking and evaluating new and existing technologies, from IBM and other vendors, and IBM s product portfolio is evolving rapidly in these critical areas. Note that these requirements overlap, as do the domains of the hardware, software, and networking components designed to address them. Workload-Optimized Systems and Appliances Consolidated infrastructure and distributed data/compute nodes demand high-performance, cost-effective processing platforms to handle the assigned workloads. This premise is self-evident, and IBM s Netezza acquisition, workload-optimized Smart Analytics System designs, Informix TimeSeries, and InfoSphere Streams and BigInsights Big platforms all testify to full commitment in this domain. Simplicity To make adding a logical warehouse node simple and cost-effective, ease of use and roughly linear scaling become non-negotiable requirements. Weeks or months of tuning and load testing a new processing node diminish the value of a distributed system, where agility and adaptability should be primary benefits. Seamless Flow The requirement is fast, reliable data flow. This requirement applies to the various data load, synchronization, and distribution tasks, and to the orchestration hub itself the various forms of glue that bind the logical data warehouse nodes into a coherent system. movement enablers and optimizations span the InfoSphere Software portfolio and include the BigInsights IBM Netezza connector, Stage ETL/ELT connectors, CDC, and the IBM Netezza Replication Module. These critical components exist today, all under the same roof at IBM, and they are being enhanced, refined, and recombined to serve the Smart Consolidation model. The latest Stage IBM Netezza and Stage BigInsights optimizations are prime examples, and others will follow. Keep in mind that in a multi-node logical warehouse, data flow traffic may not be fully predictable, nor is it one-way. Some nodes may serve primarily as pre-analysis data sources or repositories, while others function primarily as offload targets for data to be analyzed. However, data synchronization or multi-level analysis may involve data flow into, out of, or around the hub. In some cases, this may include back-and-forth traffic, or variable node-to-node flow patterns driven by dynamic rules and policies, or by data virtualization and run-time query redirection. The newly announced IBM DB2 Analytics Accelerator exemplifies the seamless data movement principle, binding IBM System z and IBM Netezza data warehouse appliances transparently with a 10GbE Private Service Network. 20

21 Minimizing Movement Although frictionless data flow is an important objective, frictionless is not possible, and data movement incurs a cost. In general, it is preferable to move the computation to the data, not the other way around, and each of our remaining technology requirements includes minimizing data movement as an unspoken benefit. Virtualization and Query Redirection The logical data warehouse presents a single, cohesive, logical face to business users and applications, which do not need to know where a particular datum physically resides. In particular, queries can be fired at a logical warehouse with no need to know which nodes are collaborating to provide the analytic service. The IBM DB2 Analytics Accelerator illustrates this concept: The DB2 for z/os accelerator comprises one or more IBM Netezza data warehouse appliances, to which (a) selected data are synchronized, and (b) deep analytic queries are redirected automatically. Figure 8: IBM DB2 Analytics Accelerator A two-node logical warehouse with a single management interface. Use virtualization to optimize resource usage automatically, reducing costs and gaining new agility. Consolidate the ever growing proliferation of data marts onto a single, easily managed platform. Analytic System SIMPLIFY Analytic System A single platform to manage and administer IBM DB2 Analytics Accelerator Here, the result is a simple two-node logical warehouse with a single management interface, exemplifying the virtualization design principle: The logical warehouse structure node topology, connectivity, data flow tools and services is transparent to both applications and end users. Infrastructure complexity is hidden from business users and applications, who access data through well-defined access points: browsers, application clients, and APIs. 21

22 Virtualizing access to a distributed infrastructure frees data consumers from concerns about where data is managed and processed, with the advantage that queries can be redirected transparently when new computational nodes or appliances are added to the infrastructure. Sophisticated Enterprise Management Tools integration, data governance, and their related disciplines and sub-disciplines (master data management, changed data capture, data quality, data cleansing, and so on) require (a) a common control platform, and (b) enterprise-level applications with cross-system visibility. A product portfolio that can supply this glue is critical. Business professionals queried about business analytics by Computerworld in 2009 identified data integration with multiple source systems and data quality as the top two challenges they have faced, or expect to face, in achieving successful business analytics. Replication In addition to its fundamental role in HA/DR scenarios, replication, in its several forms, addresses a special class of data management requirements that might be called synchronization. In a multi-node logical warehouse, moving and synchronizing data between nodes for multi-level analytics, master data management, and change data capture takes on special importance. Several new and existing IBM products address these challenges: IBM Netezza Replication Module and InfoSphere Replication Server for large-scale synchronization, InfoSphere CDC for smaller-scale synchronization, Blueprint Director for replication set-up, and so on. Integration ETL/ELT data integration tools and platforms can present significant challenges, especially at large sites. ETL/ELT subsystems often include home-built scripts and protocols, which complicate tool and process migration efforts. As a result, migrating data integration workloads from pre-existing platforms to one or more LDW nodes is typically planned as a multi-phase effort with a conservative time window. Performance Performance, as always, underlies any requirements set in this space. We define performance to include BI and analytic query speed, which is paramount, but also simplicity/ease-of-use efficiencies, including fast time-to-value, minimized admin/tuning/maintenance, and innovative agility. Performance without these ease-of-use factors confers little advantage. 22

23 The Components of a Logical Warehouse Think of a logical data warehouse as a set of data storage and/or processing nodes. The connective tissue, or glue, that binds these nodes takes the form of software, services, and networking hardware and software. Unlike a typical LAN or distributed computing grid, a logical warehouse is a profoundly heterogeneous entity that may include a very diverse set of nodes likely from multiple vendors running an extremely diverse set of tasks. Note that node types correspond quite closely with workload types. A sampling of the workloads expected of a modern data warehouse: Workload Type ETL/ELT/ Integration Governance Operational Intelligence Complex Event Processing Analytics/Advanced Analytics Line of Business s/ Warehouses Big Processing Real-Time Analytics (Big ) Time Series Processing Queryable Archiving Backup/Recovery Exploration Sandbox Test/Dev/Prototyping Short-Request/Transactional Description staging, bulk and trickle-feed data loading, ETL, ELT. Master data management (MDM), changed data capture (CDC), data quality (DQ), etc. Low-latency, real-time query and Operational BI support; BI reporting and dashboard updating. Real-time event processing for data compliance, data security, fraud detection, etc. Light-to-moderate or heavy decision support, data mining, complex in-database analytics. warehouse appliances for specific LoB applications retail analytics, ERP, etc. InfoSphere BigInsights (Hadoop) grid to analyze massive unstructured data sets. InfoSphere Streams system for high-volume stream capture and analysis. Informix TimeSeries for optimized storage and processing of time series and time interval data. High-capacity federated storage for data to which future or intermittent access is required. High-capacity, write-only systems for non-queryable archiving and/or disaster recovery. Replicated data for use in data exploration and nonproduction analytics. Nonproduction systems for application development, prototyping, and testing. OLTP or other short-request query activity. 23

24 Figure 9: Logical Warehouse Nodes Integration IBM InfoSphere BigInsights Big Processing IBM InfoSphere Streams Real-time Analytics IBM Informix TimeSeries Time Series Processing Enterprise Reporting / BI Test/Dev Prototype Sandbox InfoSphere Warehouse Enterprise IBM DB2 Analytics Accelerator Transactional + Deep Analytics IBM Netezza High- Capacity Appliances Queryable Archive IBM Netezza 1000 BI + Ad Hoc Advanced Analytics IBM Smart Analytics System Operational Analytics Oracle/Teradata ODS, Operational BI Many of these common node types have already been discussed, but Figure 9 inspires some additional observations: Existing EDW If your current enterprise data warehouse is performing as desired, some capacity planning may be in order, but leave the system in place. If the required set of virtual warehouse nodes can be encapsulated in your current system, you have already deployed the superior solution. To repeat, Smart Consolidation is a flexible, evolutionary strategy, and site-specific data stores and workloads will determine how to proceed. 24

25 Big (BigInsights, Streams) Nodes At present, text dominates current big data generation and consumption, making natural language processing the pre-eminent form of big data processing, which puts IBM s Watson at the apex of unstructured data analysis. Also, note some of the node-to-node connections in Figure 9. A logical warehouse resembles a hub-and-spoke system, but inter-node relationships are more flexible, and the multi-layered analytics enabled by these node-to-node connections will drive analytic sophistication to heights not previously attained. Finally, although the trend in structured data processing is toward permanent storage and access for all data, some extraordinary big data volumes preclude this simple solution for unstructured data. Filtering interesting data from high-speed streams is already commonplace, and the concept of data value decay remains relevant. management policies (data security, life cycle management, etc.) will have to reflect the node-by-node variations in data value and longevity. Figure 10: Value Decay Not all data has equal value, or longevity. The business value curves for data take a variety of shapes, with most trending downward over time. Value (1 10) Over Time (Months) Relative Value Months Sandbox Quarterly Finance Clickstream Mission Critical Online Survey 25

26 Advanced Analytics Node Like or Enterprise Reporting, advanced analytics nodes can take many forms. Common examples include InfoSphere Warehouse software or an IBM Netezza data warehouse appliance running in-database analytics software SPSS, IBM Netezza Analytics, FuzzyLogix, or SAS. Other workload-optimized analytics platforms include the IBM Content Analytics system, IBM Smart Analytics System, Solutions for Retail with IBM Netezza Customer Intelligence Appliance, Solutions for Communications with IBM Netezza Network Analytics Accelerator, and so on. Dependent or Independent s Smart Consolidation will never eliminate data marts completely. The goals are to consolidate the vast majority onto analytics appliances, and to pull the remainder into enterprise data management regimes. Queryable Archive IBM Netezza has announced the C1000 family of High Capacity Appliances. Deploying a high-capacity queryable archive yields these benefits: Lowers TCO by minimizing the per-terabyte cost to store large historical data sets data to which access may be required, but with something less than real-time immediacy. Frees the EDW, or other sub-optimal warehouse nodes, from having to house historical or other less frequently accessed data, thereby increasing the performance and capacity of those nodes. Increases data accessibility for historical data and other less frequently queried data sets. Removes the write-only data tomb problem. Enterprise Management / Orchestration Hub In its idealized end state, a logical data warehouse includes one or more master nodes to coordinate data integration, node synchronization, system-wide monitoring and error reporting, and LDW topology. Note that this hub can incorporate a full-scale data warehouse, which may include operational analytics and mixed workload support, for example, or other virtual nodes from the list above. Ultimately, orchestration software running on such a system will be able to combine with replication and MDM tools to configure and manage key aspects of a logical data warehouse. In principle, what we have labeled enterprise data management or enterprise data hub might take a variety of forms, from a large Smart Analytics System or custom InfoSphere Warehouse system to an Oracle base or Teradata database system. As elsewhere in the Smart Consolidation model, flexibility prevails. However, there are some general guidelines: When evolving from a large existing EDW, consolidate data marts and offload analytics processing to workload-optimized appliances first, preserving the newly unburdened EDW as the focus of enterprise data management. When building a new logical data warehouse, IBM currently recommends an IBM Smart Analytics System or custom InfoSphere Warehouse system as the best price-performance host for IBM s diverse data governance/data management/ data integration software portfolio. 26

27 Smart Consolidation Entry Points The Smart Consolidation model offers a flexible, stepwise adoption path. Each forward step confers substantial benefits, as does partial adoption. The logical warehouse can include one or more nodes from a diverse (and growing) set of node types, and new nodes can be added, or old ones removed, with minimal impact on applications. Together, these facts suggest numerous entry points on the path to Smart Consolidation, many of which will already look familiar: Stay with Traditional EDW A single, high-powered EDW remains an optimal solution for smaller clients. A node set of one remains the cleanest possible warehouse configuration, delivering optimal ease of use, minimal admin costs, fastest time-to-value, and lowest total cost. Nodes are added only as required. Like FEMA s Incident Command System (ICS), whose full structure is required only by the largest incidents, Smart Consolidation is a flexible, extensible framework that can be scaled up or down to match installation size. Few emergencies are large enough to require every component of the ICS, but the framework is there, ready and waiting. Build a New Logical Warehouse The guidance in this paper should help you avoid the pitfalls of a large monolithic system. If your priorities include analytics, then no modern system, warehouse or otherwise, can credibly absorb the diverse set of workloads directed at any sizable analytics infrastructure. Offload an Overburdened InfoSphere Warehouse, IBM Netezza, or Other EDW For fast time-to-value, offload all advanced analytics processing to one or more modern analytic appliances. To synchronize an analytics node with the central EDW, consider IBM s InfoSphere Change Capture (CDC) software, which is designed to simplify this task. Upgrade an Existing System z Warehouse with High-Performance Analytics Add the DB2 Analytics Accelerator (a fully-integrated IBM Netezza analytics appliance) to an IBM System z, creating what is, in effect, a two-node LDW. Consolidate Sprawl Again, wherever possible, consolidate data marts onto one or more modern analytic appliances. Add a Purpose-Built Analytics Appliance, or Add Analytic Processing to an Existing Warehouse As reported in TDWI s Big Analytics Best Practices Report in Q4-2011, 40% of survey respondents practice advanced analytics without big data. From the TDWI report: Much of the action in big data analytics is at the department level, and Analytic applications are departmental by nature. Add Queryable Archiving Frequently, this step is taken to satisfy high availability and/ or disaster recovery requirements, and/or data regulatory requirements. The advantages of massive, cost-effective storage that can be queried on demand are obvious to anyone familiar with traditional tape, cartridge, disk, or off-site archiving. Add Big Processing Add InfoSphere BigInsights/Hadoop, InfoSphere Streams, Informix TimeSeries, or IBM Content Analytics nodes. The Smart Consolidation model confers many benefits, but perhaps the most significant is that it helps clients quickly deploy or reposition analytic resources in response to new data sources, new processing technologies, or new business requirements. 27

28 Add an ODS or Operational BI Node (InfoSphere Warehouse/IBM Smart Analytics System) The IBM Smart Analytics System integrates InfoSphere Warehouse, which is designed for mixed-workload/high-throughput environments such as those associated with operational BI deployments. Support Geographic Expansion Add one or more regional data warehousing and analytics centers, and coordinate data governance policies from the existing EDW. Conclusion In IBM s Smart Consolidation for Smarter Warehousing strategy, a logical data warehouse replaces a monolithic system with a distributed computing architecture. governance (a global consideration) and analytic applications (often LoB-oriented) are isolated from each other as separate nodes in a grid-like ecosystem. This modular approach is smarter computing. Each task is provisioned with exactly the hardware, software, and application services it requires. Many organizations have already found a way to succeed by moving analytics processing from an already overburdened central warehouse to data warehouse appliances. IBM s Smart Consolidation strategy, designed around the logical data warehouse concept, builds on these successes, simplifying data management, automating replication, and adding governance controls. Separating data management from its exploitation accelerates time-to-value, reduces cost, and creates a flexible architecture to add new processing nodes that will run future analytic technologies as these become available. Virtualizing access to this distributed infrastructure frees users from concerns about where data is managed and processed, with the advantage that queries can be redirected transparently as new computational nodes or appliances are added to the infrastructure. Over the coming months, IBM will continue to share its evolutionary road map of products and features underlying Smart Consolidation. This strategy will allow our clients to maximize the value of their investments in data warehousing and analytics, while scaling to support new data types, higher data volumes, and more complex applications, all flexibly and with appliance simplicity. Keep in mind that the flexible three-point call to action consolidated infrastructure, distributed data/compute, and coordinated enterprise data management need not be pursued in a particular order. Furthermore, logical warehouse nodes should be viewed as modular units that can be added, repurposed, repositioned, or removed as your analytic requirements evolve, or as the competitive landscape dictates. 28

29 Figure 11: IBM s Logical Warehouse Extensible, modular logical warehouse construction offers a road map to continued expansion, including future support for new data and workload types. Features Application and workload optimized appliances and systems Seamless data movement governance and lifecycle management Continuously available Framework for integrated management Transparent, virtualized access for end users, via well-defined access points (APIs and browser clients, for example) New Sources Sensor + Meter IBM InfoSphere BigInsights Big Processing IBM InfoSphere Streams Real-time Analytics IBM Informix TimeSeries Time Series Processing Event Internet/Social Media InfoSphere Warehouse Enterprise Traditional Sources CRM, ERP, External Sources IBM Netezza 1000 BI + Ad Hoc Analytics IBM Smart Analytics System Operational Analytics IBM Netezza High- Capacity Appliances Queryable Archive Governance, Security + Lifecycle Management Netezza Corporation 26 Forest Street Marlborough, MA TEL FAX About Netezza Corporation: Netezza, an IBM Company, is the global leader in data warehouse and analytic appliances that dramatically simplify high-performance analytics across an extended enterprise. Netezza s technology enables organizations to process enormous amounts of captured data at exceptional speed, providing a significant competitive and operational advantage in today s data-intensive industries including digital media, energy, financial services, government, health and life sciences, retail, and telecommunications. Netezza is headquartered in Marlborough, Massachusetts, and has offices in North America, Europe and the Asia Pacific region. For more information about Netezza, please visit Netezza Corporation, an IBM Company. All rights reserved. All other company, brand and product names contained herein may be trademarks or registered trademarks of their respective holders.

IBM Data Warehousing and Analytics Portfolio Summary

IBM Data Warehousing and Analytics Portfolio Summary IBM Information Management IBM Data Warehousing and Analytics Portfolio Summary Information Management Mike McCarthy IBM Corporation [email protected] IBM Information Management Portfolio Current Data

More information

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse IBM Analytics Just the facts: Four critical concepts for planning the logical data warehouse 1 2 3 4 5 6 Introduction Complexity Speed is businessfriendly Cost reduction is crucial Analytics: The key to

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

White Paper. Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using

More information

Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta

More information

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Exploiting Data at Rest and Data in Motion with a Big Data Platform Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, [email protected] What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

How the oil and gas industry can gain value from Big Data?

How the oil and gas industry can gain value from Big Data? How the oil and gas industry can gain value from Big Data? Arild Kristensen Nordic Sales Manager, Big Data Analytics [email protected], tlf. +4790532591 April 25, 2013 2013 IBM Corporation Dilbert

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Big Data and Trusted Information

Big Data and Trusted Information Dr. Oliver Adamczak Big Data and Trusted Information CAS Single Point of Truth 7. Mai 2012 The Hype Big Data: The next frontier for innovation, competition and productivity McKinsey Global Institute 2012

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

IBM Netezza High Capacity Appliance

IBM Netezza High Capacity Appliance IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data

More information

Integrating Netezza into your existing IT landscape

Integrating Netezza into your existing IT landscape Marco Lehmann Technical Sales Professional Integrating Netezza into your existing IT landscape 2011 IBM Corporation Agenda How to integrate your existing data into Netezza appliance? 4 Steps for creating

More information

Data virtualization: Delivering on-demand access to information throughout the enterprise

Data virtualization: Delivering on-demand access to information throughout the enterprise IBM Software Thought Leadership White Paper April 2013 Data virtualization: Delivering on-demand access to information throughout the enterprise 2 Data virtualization: Delivering on-demand access to information

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated

More information

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining BUSINESS INTELLIGENCE Bogdan Mohor Dumitrita 1 Abstract A Business Intelligence (BI)-driven approach can be very effective in implementing business transformation programs within an enterprise framework.

More information

IBM PureFlex System. The infrastructure system with integrated expertise

IBM PureFlex System. The infrastructure system with integrated expertise IBM PureFlex System The infrastructure system with integrated expertise 2 IBM PureFlex System IT is moving to the strategic center of business Over the last 100 years information technology has moved from

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Beyond the Single View with IBM InfoSphere

Beyond the Single View with IBM InfoSphere Ian Bowring MDM & Information Integration Sales Leader, NE Europe Beyond the Single View with IBM InfoSphere We are at a pivotal point with our information intensive projects 10-40% of each initiative

More information

Apache Hadoop Patterns of Use

Apache Hadoop Patterns of Use Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

IBM Software Integrating and governing big data

IBM Software Integrating and governing big data IBM Software big data Does big data spell big trouble for integration? Not if you follow these best practices 1 2 3 4 5 Introduction Integration and governance requirements Best practices: Integrating

More information

Solutions for Communications with IBM Netezza Network Analytics Accelerator

Solutions for Communications with IBM Netezza Network Analytics Accelerator Solutions for Communications with IBM Netezza Analytics Accelerator The all-in-one network intelligence appliance for the telecommunications industry Highlights The Analytics Accelerator combines speed,

More information

IBM Big Data Platform

IBM Big Data Platform IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013 By 2015, 80% of

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Cisco UCS and Quantum StorNext: Harnessing the Full Potential of Content

Cisco UCS and Quantum StorNext: Harnessing the Full Potential of Content Solution Brief Cisco UCS and Quantum StorNext: Harnessing the Full Potential of Content What You Will Learn StorNext data management with Cisco Unified Computing System (Cisco UCS ) helps enable media

More information

Datalogix. Using IBM Netezza data warehouse appliances to drive online sales with offline data. Overview. IBM Software Information Management

Datalogix. Using IBM Netezza data warehouse appliances to drive online sales with offline data. Overview. IBM Software Information Management Datalogix Using IBM Netezza data warehouse appliances to drive online sales with offline data Overview The need Infrastructure could not support the growing online data volumes and analysis required The

More information

Virtual Data Warehouse Appliances

Virtual Data Warehouse Appliances infrastructure (WX 2 and blade server Kognitio provides solutions to business problems that require acquisition, rationalization and analysis of large and/or complex data The Kognitio Technology and Data

More information

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013 An Oracle White Paper October 2013 Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics Introduction: The value of analytics is so widely recognized today that all mid

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR ENTERPRISE EDITION OFFERS LEADING PERFORMANCE, IMPROVED PRODUCTIVITY, FLEXIBILITY AND LOWEST TOTAL COST OF OWNERSHIP

More information

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite IBM Software IBM Business Process Management Suite Increase business agility with the IBM Business Process Management Suite 2 Increase business agility with the IBM Business Process Management Suite We

More information

EMC IT S JOURNEY TO THE PRIVATE CLOUD: APPLICATIONS AND CLOUD EXPERIENCE

EMC IT S JOURNEY TO THE PRIVATE CLOUD: APPLICATIONS AND CLOUD EXPERIENCE White Paper EMC IT S JOURNEY TO THE PRIVATE CLOUD: APPLICATIONS AND CLOUD EXPERIENCE A series exploring how EMC IT is architecting for the future and our progress toward offering IT as a Service to the

More information

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

INVESTOR PRESENTATION. First Quarter 2014

INVESTOR PRESENTATION. First Quarter 2014 INVESTOR PRESENTATION First Quarter 2014 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences

More information

Deploying Big Data to the Cloud: Roadmap for Success

Deploying Big Data to the Cloud: Roadmap for Success Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

IBM Information Management

IBM Information Management IBM Information Management January 2008 IBM Information Management software Enterprise Information Management, Enterprise Content Management, Master Data Management How Do They Fit Together An IBM Whitepaper

More information

Five Technology Trends for Improved Business Intelligence Performance

Five Technology Trends for Improved Business Intelligence Performance TechTarget Enterprise Applications Media E-Book Five Technology Trends for Improved Business Intelligence Performance The demand for business intelligence data only continues to increase, putting BI vendors

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

III JORNADAS DE DATA MINING

III JORNADAS DE DATA MINING III JORNADAS DE DATA MINING EN EL MARCO DE LA MAESTRÍA EN DATA MINING DE LA UNIVERSIDAD AUSTRAL PRESENTACIÓN TECNOLÓGICA IBM Alan Schcolnik, Cognos Technical Sales Team Leader, IBM Software Group. IAE

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Beyond Watson: The Business Implications of Big Data

Beyond Watson: The Business Implications of Big Data Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

Integrating data in the Information System An Open Source approach

Integrating data in the Information System An Open Source approach WHITE PAPER Integrating data in the Information System An Open Source approach Table of Contents Most IT Deployments Require Integration... 3 Scenario 1: Data Migration... 4 Scenario 2: e-business Application

More information

Redefining Infrastructure Management for Today s Application Economy

Redefining Infrastructure Management for Today s Application Economy WHITE PAPER APRIL 2015 Redefining Infrastructure Management for Today s Application Economy Boost Operational Agility by Gaining a Holistic View of the Data Center, Cloud, Systems, Networks and Capacity

More information

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected]

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected] Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Accelerating the path to SAP BW powered by SAP HANA

Accelerating the path to SAP BW powered by SAP HANA Ag BW on SAP HANA Unleash the power of imagination Dramatically improve your decision-making ability, reduce risk and lower your costs, Accelerating the path to SAP BW powered by SAP HANA Hardware Software

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

SAP HANA - an inflection point

SAP HANA - an inflection point SAP HANA forms the future technology foundation for new, innovative applications based on in-memory technology. It enables better performing business strategies, including planning, forecasting, operational

More information

SQL Maestro and the ELT Paradigm Shift

SQL Maestro and the ELT Paradigm Shift SQL Maestro and the ELT Paradigm Shift Abstract ELT extract, load, and transform is replacing ETL (extract, transform, load) as the usual method of populating data warehouses. Modern data warehouse appliances

More information

Evolving Data Warehouse Architectures

Evolving Data Warehouse Architectures Evolving Data Warehouse Architectures In the Age of Big Data Philip Russom April 15, 2014 TDWI would like to thank the following companies for sponsoring the 2014 TDWI Best Practices research report: Evolving

More information

ORACLE DATA INTEGRATOR ENTEPRISE EDITION FOR BUSINESS INTELLIGENCE

ORACLE DATA INTEGRATOR ENTEPRISE EDITION FOR BUSINESS INTELLIGENCE ORACLE DATA INTEGRATOR ENTEPRISE EDITION FOR BUSINESS INTELLIGENCE KEY FEATURES AND BENEFITS (E-LT architecture delivers highest performance. Integrated metadata for alignment between Business Intelligence

More information

Role of Analytics in Infrastructure Management

Role of Analytics in Infrastructure Management Role of Analytics in Infrastructure Management Contents Overview...3 Consolidation versus Rationalization...5 Charting a Course for Gaining an Understanding...6 Visibility into Your Storage Infrastructure...7

More information

Learn How to Leverage System z in Your Cloud

Learn How to Leverage System z in Your Cloud Learn How to Leverage System z in Your Cloud Mike Baskey IBM Thursday, February 7 th, 2013 Session 12790 Cloud implementations that include System z maximize Enterprise flexibility and increase cost savings

More information

THE QUEST FOR A CLOUD INTEGRATION STRATEGY

THE QUEST FOR A CLOUD INTEGRATION STRATEGY THE QUEST FOR A CLOUD INTEGRATION STRATEGY ENTERPRISE INTEGRATION Historically, enterprise-wide integration and its countless business benefits have only been available to large companies due to the high

More information

IBM Solution Framework for Lifecycle Management of Research Data. 2008 IBM Corporation

IBM Solution Framework for Lifecycle Management of Research Data. 2008 IBM Corporation IBM Solution Framework for Lifecycle Management of Research Data Aspects of Lifecycle Management Research Utilization of research paper Usage history Metadata enrichment Usage Pattern / Citation Collaboration

More information

BEYOND BI: Big Data Analytic Use Cases

BEYOND BI: Big Data Analytic Use Cases BEYOND BI: Big Data Analytic Use Cases Big Data Analytics Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence

More information

The Liaison ALLOY Platform

The Liaison ALLOY Platform PRODUCT OVERVIEW The Liaison ALLOY Platform WELCOME TO YOUR DATA-INSPIRED FUTURE Data is a core enterprise asset. Extracting insights from data is a fundamental business need. As the volume, velocity,

More information

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC Vision Big data and analytic initiatives within enterprises have been rapidly maturing from experimental efforts to production-ready deployments.

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

QlikView Business Discovery Platform. Algol Consulting Srl

QlikView Business Discovery Platform. Algol Consulting Srl QlikView Business Discovery Platform Algol Consulting Srl Business Discovery Applications Application vs. Platform Application Designed to help people perform an activity Platform Provides infrastructure

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Why DBMSs Matter More than Ever in the Big Data Era

Why DBMSs Matter More than Ever in the Big Data Era E-PAPER FEBRUARY 2014 Why DBMSs Matter More than Ever in the Big Data Era Having the right database infrastructure can make or break big data analytics projects. TW_1401138 Big data has become big news

More information

Big Data Comes of Age: Shifting to a Real-time Data Platform

Big Data Comes of Age: Shifting to a Real-time Data Platform An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for SAP April 2013 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents Introduction... 1 Drivers of Change...

More information

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Sponsored by: Prepared by: Eric Slack, Sr. Analyst May 2012 Storage Infrastructures for Big Data Workflows Introduction Big

More information

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

Architecting for the Internet of Things & Big Data

Architecting for the Internet of Things & Big Data Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to

More information

Service Oriented Data Management

Service Oriented Data Management Service Oriented Management Nabin Bilas Integration Architect Integration & SOA: Agenda Integration Overview 5 Reasons Why Is Critical to SOA Oracle Integration Solution Integration

More information

Splunk Company Overview

Splunk Company Overview Copyright 2015 Splunk Inc. Splunk Company Overview Name Title Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected

More information

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Implementing Oracle BI Applications during an ERP Upgrade

Implementing Oracle BI Applications during an ERP Upgrade 1 Implementing Oracle BI Applications during an ERP Upgrade Jamal Syed Table of Contents TABLE OF CONTENTS... 2 Executive Summary... 3 Planning an ERP Upgrade?... 4 A Need for Speed... 6 Impact of data

More information

SAP Sybase Replication Server What s New in 15.7.1 SP100. Bill Zhang, Product Management, SAP HANA Lisa Spagnolie, Director of Product Marketing

SAP Sybase Replication Server What s New in 15.7.1 SP100. Bill Zhang, Product Management, SAP HANA Lisa Spagnolie, Director of Product Marketing SAP Sybase Replication Server What s New in 15.7.1 SP100 Bill Zhang, Product Management, SAP HANA Lisa Spagnolie, Director of Product Marketing Agenda SAP Sybase Replication Server Overview Replication

More information