INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION

Size: px
Start display at page:

Download "INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION"

Transcription

1

2 Table of Contents INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION PERVASIVE CONNECTIVITY HIGH-SPEED COMPRESSION SCALABLE ARCHITECTURE HIGH-PERFORMANCE ETL IN ACTION AT COMSCORE CONCLUSION

3 INTRODUCTION Do a Google search on Big Data and you ll get nearly 2 billion results. Clearly the term is top of mind, as well it should be. Nearly any organization of any size stands to gain from the enhanced services, better products, or operational efficiencies that greater data insights enable. But only a small fraction is maximizing these benefits today. According to IDC, the digital universe measures in the trillions of gigabytes and will continue to double every two years. While not all that data is valuable, still, less than.5% is currently being analyzed* not because organizations don t recognize the potential value to be gained, but because they either lack the tools do so, or their conventional approaches to data integration aren t able to keep pace with the Three V s (Volume, Velocity, and Variety). Whether you re dealing with petabytes of data or just a few gigabytes, having the right tools and integration architecture in place will help you quickly and effectively transform data, Big or otherwise, into competitive insights that will enable you to identify new revenue opportunities, save costs, increase operational efficiencies, improve products and services, and remain competitive. The Impact of Data Integration Done Right OPERATIONAL FINANCIAL BUSINESS Process more data in less time with less effort Less hardware & storage to maintain, manage, & replace Install & deploy with no worries about data volumes Quickly & easily respond to business-user requests Reduce data integration TCO by up to 65% Defer or eliminate additional infrastructure purchases Support future initiatives without increasing budgets Increase ROI of existing IT investments Maximize agility with quicker access to more data Uncover new revenue opportunities Reduce business risk & ensure compliance Align IT with strategic business objectives * The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. IDC, December

4 THE EVOLUTION OF ETL TOOLS Organizations have struggled to make sense of data for decades, using one-off point solutions and custom coding to try to extract meaningful information. But in the late 1990s a new way of thinking emerged. Instead of relying on skilled developers to use complex, manual structured query language (SQL) scripts for preparing and transforming data, ETL (Extract, Transform, Load) and Data Integration (DI) tools were introduced to simplify the process. In a time when data transformation was relatively straightforward, their engines and meta-data driven design enabled more users to build and deploy data integration flows. However, as data volumes and sources quickly grew these solutions were unable to keep up. Even organizations that didn t have huge volumes of data, but had a need for more complex data transformations, were faced with growing costs and performance issues. Reluctant to abandon their sizable investments in these tools, many IT departments tried to overcome performance and scalability challenges by returning to hand coding SQL, and by pushing transformations down to the data warehouse. But this kluged approach created unnecessary complexity, consumed significant resources, and piled on more costs, creating an unsustainable model moving forward. In fact, data integration now consumes up to 80% of database capacity. Oracle Files / XML ERP Mainframe Real-Time Hadoop Big Data Conventional DI Solutions ETL Data Warehouse ETL Data Mart Data Mart Data Mart That s why today, while organizations strive to harness the power of data for competitive advantage, the reality is that the high total cost of ownership, ongoing tuning and maintenance efforts, and performance limitations of current approaches stand in the way. This situation has prompted many organizations to step back and ask: the world has changed dramatically in the last 20 years, what does that mean for my approach to data integration and how can I adapt quickly enough to ensure a clear path forward? Whether you already have a set of ETL and DI tools or not, what follows is a checklist designed to help you evaluate high-performance ETL to ensure your next move will reduce the costs and complexity of your data integration initiatives as well as complement and optimize existing DI platforms for faster performance and lower resource utilization. 4

5 A CHECKLIST FOR HIGH-PERFORMANCE ETL Tapping into previously unused sources of information is changing the way business is done. For organizations it can uncover new revenue streams and operational efficiencies. For consumers it can literally change the way we live and work with products and services not only tailored to our individual needs, but even anticipating them. As you can image, to achieve this level of sophistication and speed, performance at scale is key. And performance, for any software system, is based on the performance triangle. Efficiency and speed require the balancing of CPU, memory and I/O. The Performance Triangle: The performance triangle accurately reflects the delicate balance between these three resources overuse of one has an immediate impact on the others. For example, executing a join that exceeds physical memory will require additional disk space and CPU time. Most conventional ETL tools are CPUand memory-bound but, ultimately, all I/O dependent. As a result, to increase performance you need an approach that minimizes the impact on every aspect of triangle no easy task. 5

6 Checklist: KEY CAPABILITIES HIGH-PERFORMANCE ETL MUST DELIVER TO ADDRESS THE PERFORMANCE TRIANGLE: Development productivity: Shifting the burden of handling common and repetitive tasks as well as performance tuning from the individual to the technology Dynamic optimization: Leveraging algorithms, optimizations, and smart technology to intelligently accelerate performance on-the-fly Pervasive connectivity: Enabling connectivity with a wide variety of sources and targets and incorporating innovations like Direct I/O to enable a more efficient transfer of larger blocks of data High-speed compression: Taking compression to a new level by incorporating algorithms and technologies to address the entire transformation process Scalable architecture: Designed for today s dynamic business requirements and environments with efficient processing methods dynamically executed as needed By checking all the boxes, you can be sure you ve identified a way to cost-effectively solve your enterpriseclass data integration challenges regardless of data volumes, complexity or velocity. Let s take an upclose look at each. 6

7 Development Productivity The initial promise of ETL and DI tools was user productivity existing IT teams with a broader set of skills and no specialized knowledge would be able to quickly build, deploy, and re-use highly scalable data integration flows. But when the demands of robust data integration set in and IT departments reverted to SQL in an attempt to meet these demands, productivity slowed to a crawl. As a result, developers are bogged down writing, maintaining, and extending thousands of lines of complex code to cope with changing business requirements. Conventional ETL tools also put the burden of tuning for performance and scalability on the developer. Not only must the developer code meet functional requirements, but also design for performance, a rare combination of skills that is only gained after years of experience and finely honed expertise with a specific tool. A lack of metadata puts even greater challenges on organizations with hybrid development environments. Dispersed on- and off-shore teams face significant complications sharing, testing, and propagating jobs across dispersed production environments. High-performance ETL shifts the burden of handling common and repetitive tasks, as well as performance tuning, from individuals to software. CHECKLIST QUESTIONS When determining if a solution will support development productivity, ask these questions: WHAT PERCENTAGE OF MY DEVELOPERS TIME WILL BE SPENT WRITING CODE? Reusable tasks are self-contained, unit-testable and can be assembled to create jobs and accelerate updates, minimize the risk of errors or delays that typically occur when manually writing and re-writing hundreds of lines of code. DO MY DEVELOPERS NEED ANY SPECIFIC SKILL TO ENSURE PERFORMANCE OPTIMIZATION? Built-in optimization capabilities seamlessly handle the performance issues of any job or task, enabling users to design for functionality and inherit performance. 7

8 Dynamic ETL Optimization Achieving the highest levels of throughput with minimum resource utilization becomes increasing difficult as the demand for critical information and data volumes rise. As much as 80 percent of all ETL processing is spent sorting records. Joins, aggregations, rankings, database loads, etc., all depend on sorting to complete their processing. Even the final step of loading data into a target database can be more efficient using less CPU and elapsed time if the data is sorted first. But sorting records with conventional tools is typically the most inefficient step in the ETL process. And as business requirements increase, most organizations need to invest in more hardware. Adding to the complexity, balancing the performance triangle between memory, CPU and disk space is a moving target. As business requirements change, so do the number and type of data sources, the type of transformations, and the volumes of data; all of this happens in an environment where a variety of applications (ETL, relational databases, etc.) continuously compete for priority. Therefore, the level of tuning that needs to be achieved and maintained to ensure maximum performance at runtime simply isn t possible using manual methods or a static, one-size-fits-all approach. High-performance ETL leverages algorithms, optimizations and smart technology to intelligently accelerate performance on-the-fly. 8

9 CHECKLIST QUESTIONS When determining if a solution can dynamically self-optimize, ask these questions: HOW CAN THE SOLUTION HELP ENSURE I M ACHIEVING MAXIMUM RUNTIME PERFORMANCE AND MINIMUM RESOURCE UTILIZATION? Look for solutions that include a full library of algorithms and optimizations (covering sorting, joins, merges, aggregations, transformation, copies, memory management and compressions), as well as technology to handle the complexities of optimization by dynamically selecting and even switching algorithms midstream. Removing users from the process via automation and using highly targeted algorithms will ensure you aren t leaving optimization up to chance. HOW HAS THE SOLUTION PERFORMED IN ENVIRONMENTS SIMILAR TO MINE? Third-party validation, including patents, customer examples and benchmarks, as well as the opportunity to conduct proof of concepts with no manual tuning allowed, will quickly verify if you can achieve faster performance with fewer resources on existing hardware. 9

10 Pervasive Connectivity Data often comes from a big list of data sources and targets, including relational databases, files, mainframes, CRM systems, web logs, HDFS, social media, and more. Unlocking the insights from this data quickly and easily is at the heart of the value to be gained from big data. Without it, business agility suffers as organizations can t react quickly to market dynamics, change in customer behavior, and new competitive forces. Any cost-effective approach to data integration must be capable of seamlessly plugging into a range of file and storage systems as well as other DI solutions. But connectivity alone isn t enough. Every ETL process is ultimately I/O bound, especially at the end points of a job: extracting the data from the source and loading it into the target. The transformation phase can also quickly become disk bound when carrying out an operation that exceeds physical memory. Since disk is generally the slowest resource in most computing environments, its misuse can have the most dramatic impact on performance. High-performance ETL enables connectivity with a wide variety of sources and targets and incorporates innovations like Direct I/O to enable a more efficient transfer of larger blocks of data. CHECKLIST QUESTIONS When determining if a solution has the pervasive connectivity your organization requires, ask these questions: HOW CAN I CONNECT WITH ALL THE DATA SOURCES AND TARGETS IN MY ENVIRONMENT? It s fair to expect native connectivity for a range of sources and targets including files, relational database management systems (RDBMs), real-time ERP, appliances, cloud, JSON, XML, mainframe and legacy systems. WHAT TECHNIQUES ARE USED TO ELIMINATE I/O BOTTLENECKS? Solutions that fully leverage Direct I/O bypass the OS buffer cache, enabling a more efficient transfer of larger blocks of data. By avoiding an extra memory copy, less CPU is utilized. Automatic sort optimizations for larger sources, as well as built-in direct read and direct load optimizations (for example, reading directly from Oracle data files and bypassing Oracle s client OCI interface) will deliver further performance improvements, as much as 30%. 10

11 High-Speed Compression Given the increasing diversity of data sources and targets, including those residing in the cloud, data management costs can quickly become unsustainable. Large data volumes increase not only storage costs but also disk read/write access and network I/O, resulting in a negative impact on performance. Compression technology can help solve the storage and performance challenges, prompting the leading database and appliance vendors to make considerable investments in these technologies. For data integration, compression can be applied to minimize storage requirements and accelerate overall elapsed time by decreasing the amount of I/O, saving terabytes of storage and doubling performance when compared to conventional DI approaches. High-performance ETL takes compression to a new level, incorporating algorithms and technologies to address the entire transformation process. Syncsort DMX TARGETS DATA SOURCES TEMPORARY WORKSPACE 11

12 CHECKLIST QUESTIONS When determining if a solution fully enables high-speed compression, ask these questions: HOW IS COMPRESSION APPLIED TO DELIVER I/O SAVINGS? Solutions that optimize compression for reading and writing data files incorporate high-speed compression algorithms allowing the tool to support compression at all critical stages including sources, targets, temporary workspace storage and on-the-fly compression. HOW IS COMPRESSION FOR DISK SPACE HANDLED? Applying compression to temporary work spaces enables significant storage savings for large data volumes. Depending on data compression ratios and system specifications, such as the number and speed of CPUs and I/O rate, high-performance compression can deliver over 2x faster elapsed time and storage savings of up to 90% for even simple tasks. 12

13 Scalable Architecture Conventional ETL tools were designed in a different time, for a different time. Although functionality has been added, most of these tools need to push transformations down to the database to overcome performance and scalability challenges. This architectural decision has proven costly and complex. High-performance ETL incorporates dynamic ETL optimization, direct I/O and compression to perform heavy transformations on-the-fly without the need for temporary staging areas. Transformations are processed in-memory on commodity hardware and use temporary staging, on commodity disks, when memory is not enough, dramatically accelerating performance and reducing costs. Another design challenge with conventional ETL vendors is the use of heavy architectures that make inefficient use of resources. Moreover, conventional ETL tools and hand coding typically require a deployment or compile step creating rigid ETL flows that limit the amount of flexibility at runtime to adapt to changing conditions. These approaches also tend to have very poor thread and processes management often constrained by overwhelming thread and process spawning requests swamping the operating system. Their performance is hampered by their very design making them unsuitable for high-performance at scale. High-performance ETL is designed for today s dynamic business environments with efficient processing methods dynamically executed as needed. CHECKLIST QUESTIONS When determining if a solution is based on a scalable architecture, ask these questions: HOW DOES THE SOLUTION HANDLE PROCESSES AND THREADS? Solutions with hybrid multiprocess and multi-thread based architectures offer the full benefits of a master orchestration process with threads that are dynamically spawned/killed based on demand and processing. HOW DOES THE ARCHITECTURE OPTIMIZE PERFORMANCE WHILE SCALING? A truly scalable architecture automatically controls the processing method and conserves resources by only allocating them to steps as needed, maximizing performance as data flows through the integration job and supporting more jobs. The ability to dynamically process scripts at runtime delivers faster start-up and runtime performance while allowing greater flexibility, especially when passing dynamic variables/parameters. 13

14 A Real-World Example HIGH-PERFORMANCE ETL IN ACTION AT COMSCORE A leading internet technology company, comscore, measures what people do as they navigate the digital world and turns that information into insights and actions to help 1,800 organizations around the globe maximize the value of their digital investments. Data integration is a critical business process for comscore; their success depends on their ability to monitor, collect, transform and analyze data from a panel of 2 million internet users and an extensive network of sites participating in its Unified Digital Measurement (UDM) program. comscore collects information 24x7 from browsing to what people read, buy, and subscribe to and then sorts and aggregates that data. Within their first year, data volumes grew dramatically. They deployed Syncsort in 2000 and gained a 5-10X improvement in data processing speed. In 2009, comscore unveiled UDM data volumes and complexity skyrocketed. To support this innovation, the company decided to also leverage Hadoop but to rely on Syncsort to sort, partition, and compress the data before loading it into Hadoop and to optimize their Hadoop environment. The performance and ease of use of Syncsort DMX positively impacts our bottom line; DMX technology is able to convert raw click-stream data into valuable granular information at lightning speed. MIKE BROWN, CTO 14

15 15

16 CONCLUSION Tapping into previously unused sources of information is changing the way business is done and key to remaining relevant and competitive. You can t afford to be standing on the sidelines, missing out on valuable insights that will help you identify new revenue opportunities, save costs, increase operational efficiencies, improve products and services, and remain competitive. Whether you already have a set of ETL and DI tools or not, this checklist was designed to help you assess high-performance ETL solutions for data integration that stands up to today s challenges. Development productivity: Shifting the burden of handling common and repetitive tasks as well as performance tuning from the individual to the technology Dynamic optimization: Leveraging algorithms, optimizations, and smart technology to intelligently accelerate performance on-the-fly Pervasive connectivity: Enabling connectivity with a wide variety of sources and targets and incorporating innovations like Direct I/O to enable a more efficient transfer of larger blocks of data High-speed compression: Taking compression to a new level by incorporating algorithms and technologies to address the entire transformation process Scalable architecture: Designed for today s dynamic business requirements and environments with efficient processing methods dynamically executed as needed By checking all the boxes, you can ensure that the high-performance ETL solution you select will reduce the costs and complexity of your data integration initiatives, as well as complement and optimize existing DI platforms for faster performance and lower resource utilization. 16

17 ABOUT US Syncsort provides data-intensive organizations across the big data continuum with a smarter way to collect and process the ever-expanding data avalanche. With thousands of deployments across all major platforms, including mainframe, Syncsort helps customers around the world to overcome the architectural limits of today s ETL and Hadoop environments, empowering their organizations to drive better business outcomes in less time, with less resources and lower TCO. For more information visit LIKE THIS? SHARE IT! 2014 Syncsort Incorporated. All rights reserved. DMExpress is a trademark of Syncsort Incorporated. All other company and product names used herein may be the trademarks of their respective companies. DMX-EB US

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

An Accenture Point of View. Oracle Exalytics brings speed and unparalleled flexibility to business analytics

An Accenture Point of View. Oracle Exalytics brings speed and unparalleled flexibility to business analytics An Accenture Point of View Oracle Exalytics brings speed and unparalleled flexibility to business analytics Keep your competitive edge with analytics When it comes to working smarter, organizations that

More information

Five Technology Trends for Improved Business Intelligence Performance

Five Technology Trends for Improved Business Intelligence Performance TechTarget Enterprise Applications Media E-Book Five Technology Trends for Improved Business Intelligence Performance The demand for business intelligence data only continues to increase, putting BI vendors

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION

ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION Enzo Unified Solves Real-Time Data Integration Challenges that Increase Business Agility and Reduce Operational Complexities CHALLENGES

More information

Presenters: Luke Dougherty & Steve Crabb

Presenters: Luke Dougherty & Steve Crabb Presenters: Luke Dougherty & Steve Crabb About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2 ETL is THE best use case for Hadoop. ShanH

More information

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA How to leverage SAP HANA for fast ROI and business advantage 5 STEPS to success with SAP HANA Unleashing the value of HANA 5 steps to success with SAP HANA How to leverage SAP HANA for fast ROI and business

More information

Is ETL Becoming Obsolete?

Is ETL Becoming Obsolete? Is ETL Becoming Obsolete? Why a Business-Rules-Driven E-LT Architecture is Better Sunopsis. All rights reserved. The information contained in this document does not constitute a contractual agreement with

More information

Speeding ETL Processing in Data Warehouses White Paper

Speeding ETL Processing in Data Warehouses White Paper Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

High performance ETL Benchmark

High performance ETL Benchmark High performance ETL Benchmark Author: Dhananjay Patil Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 07/02/04 Email: [email protected] Abstract: The IBM server iseries

More information

The IBM Cognos Platform for Enterprise Business Intelligence

The IBM Cognos Platform for Enterprise Business Intelligence The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics

More information

www.ducenit.com Analance Data Integration Technical Whitepaper

www.ducenit.com Analance Data Integration Technical Whitepaper Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring

More information

Virtual Data Warehouse Appliances

Virtual Data Warehouse Appliances infrastructure (WX 2 and blade server Kognitio provides solutions to business problems that require acquisition, rationalization and analysis of large and/or complex data The Kognitio Technology and Data

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Key Attributes for Analytics in an IBM i environment

Key Attributes for Analytics in an IBM i environment Key Attributes for Analytics in an IBM i environment Companies worldwide invest millions of dollars in operational applications to improve the way they conduct business. While these systems provide significant

More information

Oracle FS1 Flash Storage System

Oracle FS1 Flash Storage System Introducing Oracle FS1 Flash Storage System The industry s most intelligent flash storage The digital revolution continues unabated, forming an information universe that s growing exponentially and is

More information

A HIGH-PERFORMANCE, SCALABLE BIG DATA APPLIANCE LAURA CHU-VIAL, SENIOR PRODUCT MARKETING MANAGER JOACHIM RAHMFELD, VP FIELD ALLIANCES OF SAP

A HIGH-PERFORMANCE, SCALABLE BIG DATA APPLIANCE LAURA CHU-VIAL, SENIOR PRODUCT MARKETING MANAGER JOACHIM RAHMFELD, VP FIELD ALLIANCES OF SAP A HIGH-PERFORMANCE, SCALABLE BIG DATA APPLIANCE LAURA CHU-VIAL, SENIOR PRODUCT MARKETING MANAGER JOACHIM RAHMFELD, VP FIELD ALLIANCES OF SAP WEBTECH EDUCATIONAL SERIES A HIGH-PERFORMANCE, SCALABLE BIG

More information

How your business can successfully monetize API enablement. An illustrative case study

How your business can successfully monetize API enablement. An illustrative case study How your business can successfully monetize API enablement An illustrative case study During the 1990s the World Wide Web was born. During the 2000s, it evolved from a collection of fragmented services

More information

www.sryas.com Analance Data Integration Technical Whitepaper

www.sryas.com Analance Data Integration Technical Whitepaper Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring

More information

BIG Data Analytics Move to Competitive Advantage

BIG Data Analytics Move to Competitive Advantage BIG Data Analytics Move to Competitive Advantage where is technology heading today Standardization Open Source Automation Scalability Cloud Computing Mobility Smartphones/ tablets Internet of Things Wireless

More information

OFFLOADING TERADATA. With Hadoop A 1-2-3 APPROACH TO NEW HADOOP GUIDE!

OFFLOADING TERADATA. With Hadoop A 1-2-3 APPROACH TO NEW HADOOP GUIDE! NEW HADOOP GUIDE! A 1-2-3 APPROACH TO OFFLOADING TERADATA With Hadoop A Practical Guide to Freeing up Valuable Teradata Capacity & Saving Costs with Hadoop Table of Contents INTRO: THE PERVASIVE IMPACT

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

Why Big Data Analytics?

Why Big Data Analytics? An ebook by Datameer Why Big Data Analytics? Three Business Challenges Best Addressed Using Big Data Analytics It s hard to overstate the importance of data for businesses today. It s the lifeline of any

More information

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

The Rise of Industrial Big Data

The Rise of Industrial Big Data GE Intelligent Platforms The Rise of Industrial Big Data Leveraging large time-series data sets to drive innovation, competitiveness and growth capitalizing on the big data opportunity The Rise of Industrial

More information

HARNESS IT. An introduction to business intelligence solutions. THE SITUATION THE CHALLENGES THE SOLUTION THE BENEFITS

HARNESS IT. An introduction to business intelligence solutions. THE SITUATION THE CHALLENGES THE SOLUTION THE BENEFITS HARNESS IT. An introduction to business intelligence solutions. THE SITUATION THE CHALLENGES THE SOLUTION THE BENEFITS THE SITUATION Data is growing exponentially in size and complexity. Traditional analytics

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55% openbench Labs Executive Briefing: April 19, 2013 Condusiv s Server Boosts Performance of SQL Server 2012 by 55% Optimizing I/O for Increased Throughput and Reduced Latency on Physical Servers 01 Executive

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Enabling Real-Time Sharing and Synchronization over the WAN

Enabling Real-Time Sharing and Synchronization over the WAN Solace message routers have been optimized to very efficiently distribute large amounts of data over wide area networks, enabling truly game-changing performance by eliminating many of the constraints

More information

Empowering the Masses with Analytics

Empowering the Masses with Analytics Empowering the Masses with Analytics THE GAP FOR BUSINESS USERS For a discussion of bridging the gap from the perspective of a business user, read Three Ways to Use Data Science. Ask the average business

More information

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform: Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.

More information

Big Data and Its Impact on the Data Warehousing Architecture

Big Data and Its Impact on the Data Warehousing Architecture Big Data and Its Impact on the Data Warehousing Architecture Sponsored by SAP Speaker: Wayne Eckerson, Director of Research, TechTarget Wayne Eckerson: Hi my name is Wayne Eckerson, I am Director of Research

More information

SQLstream 4 Product Brief. CHANGING THE ECONOMICS OF BIG DATA SQLstream 4.0 product brief

SQLstream 4 Product Brief. CHANGING THE ECONOMICS OF BIG DATA SQLstream 4.0 product brief SQLstream 4 Product Brief CHANGING THE ECONOMICS OF BIG DATA SQLstream 4.0 product brief 2 Latest: The latest release of SQlstream s award winning s-streaming Product Portfolio, SQLstream 4, is changing

More information

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage SAP HANA Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage Deep analysis of data is making businesses like yours more competitive every day. We ve all heard the reasons: the

More information

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse IBM Analytics Just the facts: Four critical concepts for planning the logical data warehouse 1 2 3 4 5 6 Introduction Complexity Speed is businessfriendly Cost reduction is crucial Analytics: The key to

More information

Enterprise Data Integration

Enterprise Data Integration Enterprise Data Integration Access, Integrate, and Deliver Data Efficiently Throughout the Enterprise brochure How Can Your IT Organization Deliver a Return on Data? The High Price of Data Fragmentation

More information

THE FASTEST, EASIEST WAY TO INTEGRATE ORACLE SYSTEMS WITH SALESFORCE Real-Time Integration, Not Data Duplication

THE FASTEST, EASIEST WAY TO INTEGRATE ORACLE SYSTEMS WITH SALESFORCE Real-Time Integration, Not Data Duplication THE FASTEST, EASIEST WAY TO INTEGRATE ORACLE SYSTEMS WITH SALESFORCE Real-Time Integration, Not Data Duplication Salesforce may be called the Customer Success Platform, but success with this CRM is highly

More information

Datalogix. Using IBM Netezza data warehouse appliances to drive online sales with offline data. Overview. IBM Software Information Management

Datalogix. Using IBM Netezza data warehouse appliances to drive online sales with offline data. Overview. IBM Software Information Management Datalogix Using IBM Netezza data warehouse appliances to drive online sales with offline data Overview The need Infrastructure could not support the growing online data volumes and analysis required The

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

SAP BW on HANA : Complete reference guide

SAP BW on HANA : Complete reference guide SAP BW on HANA : Complete reference guide Applies to: SAP BW 7.4, SAP HANA, BW on HANA, BW 7.3 Summary There have been many architecture level changes in SAP BW 7.4. To enable our customers to understand

More information

Zend and IBM: Bringing the power of PHP applications to the enterprise

Zend and IBM: Bringing the power of PHP applications to the enterprise Zend and IBM: Bringing the power of PHP applications to the enterprise A high-performance PHP platform that helps enterprises improve and accelerate web and mobile application development Highlights: Leverages

More information

Buying vs. Building Business Analytics. A decision resource for technology and product teams

Buying vs. Building Business Analytics. A decision resource for technology and product teams Buying vs. Building Business Analytics A decision resource for technology and product teams Introduction Providing analytics functionality to your end users can create a number of benefits. Actionable

More information

Oracle9i Release 2 Database Architecture on Windows. An Oracle Technical White Paper April 2003

Oracle9i Release 2 Database Architecture on Windows. An Oracle Technical White Paper April 2003 Oracle9i Release 2 Database Architecture on Windows An Oracle Technical White Paper April 2003 Oracle9i Release 2 Database Architecture on Windows Executive Overview... 3 Introduction... 3 Oracle9i Release

More information

Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data

Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data Transforming Data into Intelligence Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data Big Data Data Warehousing Data Governance and Quality

More information

Simple. Extensible. Open.

Simple. Extensible. Open. White Paper Simple. Extensible. Open. Unleash the Value of Data with EMC ViPR Global Data Services Abstract The following paper opens with the evolution of enterprise storage infrastructure in the era

More information

BUSINESSOBJECTS DATA INTEGRATOR

BUSINESSOBJECTS DATA INTEGRATOR PRODUCTS BUSINESSOBJECTS DATA INTEGRATOR IT Benefits Correlate and integrate data from any source Efficiently design a bulletproof data integration process Accelerate time to market Move data in real time

More information

Business Usage Monitoring for Teradata

Business Usage Monitoring for Teradata Managing Big Analytic Data Business Usage Monitoring for Teradata Increasing Operational Efficiency and Reducing Data Management Costs How to Increase Operational Efficiency and Reduce Data Management

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

Actian Vector in Hadoop

Actian Vector in Hadoop Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Building Data-Driven Internet of Things (IoT) Applications

Building Data-Driven Internet of Things (IoT) Applications Building Data-Driven Internet of Things (IoT) Applications A four-step primer IOT DEMANDS NEW APPLICATIONS Automated homes. Connected cars. Smart cities. The Internet of Things (IoT) will forever change

More information

Wait-Time Analysis Method: New Best Practice for Performance Management

Wait-Time Analysis Method: New Best Practice for Performance Management WHITE PAPER Wait-Time Analysis Method: New Best Practice for Performance Management September 2006 Confio Software www.confio.com +1-303-938-8282 SUMMARY: Wait-Time analysis allows IT to ALWAYS find the

More information

Taming Big Data. 1010data ACCELERATES INSIGHT

Taming Big Data. 1010data ACCELERATES INSIGHT Taming Big Data 1010data ACCELERATES INSIGHT Lightning-fast and transparent, 1010data analytics gives you instant access to all your data, without technical expertise or expensive infrastructure. TAMING

More information

High-Volume Data Warehousing in Centerprise. Product Datasheet

High-Volume Data Warehousing in Centerprise. Product Datasheet High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Unleash your intuition

Unleash your intuition Introducing Qlik Sense Unleash your intuition Qlik Sense is a next-generation self-service data visualization application that empowers everyone to easily create a range of flexible, interactive visualizations

More information

Simplify Software as a Service (SaaS) Integration

Simplify Software as a Service (SaaS) Integration Simplify Software as a Service (SaaS) Integration By Simon Peel December 2008 Introduction Fuelled by a fiercely competitive business environment that requires the pace of business and technology to accelerate,

More information

ORACLE PROJECT ANALYTICS

ORACLE PROJECT ANALYTICS ORACLE PROJECT ANALYTICS KEY FEATURES & BENEFITS FOR BUSINESS USERS Provides role-based project insight across the lifecycle of a project and across the organization Delivers a single source of truth by

More information

A Unified View of Network Monitoring. One Cohesive Network Monitoring View and How You Can Achieve It with NMSaaS

A Unified View of Network Monitoring. One Cohesive Network Monitoring View and How You Can Achieve It with NMSaaS A Unified View of Network Monitoring One Cohesive Network Monitoring View and How You Can Achieve It with NMSaaS Executive Summary In the past few years, the enterprise computing technology has changed

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Using an In-Memory Data Grid for Near Real-Time Data Analysis SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Maximize IT for Real Business Advantage 3 Key

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief Optimizing Storage for Better TCO in Oracle Environments INFOSTOR Executive Brief a QuinStreet Excutive Brief. 2012 To the casual observer, and even to business decision makers who don t work in information

More information

WHITE PAPER OCTOBER 2014. Unified Monitoring. A Business Perspective

WHITE PAPER OCTOBER 2014. Unified Monitoring. A Business Perspective WHITE PAPER OCTOBER 2014 Unified Monitoring A Business Perspective 2 WHITE PAPER: UNIFIED MONITORING ca.com Table of Contents Introduction 3 Section 1: Today s Emerging Computing Environments 4 Section

More information

IBM Enterprise Linux Server

IBM Enterprise Linux Server IBM Systems and Technology Group February 2011 IBM Enterprise Linux Server Impressive simplification with leading scalability, high availability and security Table of Contents Executive Summary...2 Our

More information

CA Workload Automation

CA Workload Automation PRODUCT SHEET: CA Workload Automation CA Workload Automation Improve the availability of critical IT workload processes and schedules enterprise-wide by leveraging real-time IT automation, embedded workflow,

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

IBM Global Business Services Microsoft Dynamics AX solutions from IBM

IBM Global Business Services Microsoft Dynamics AX solutions from IBM IBM Global Business Services Microsoft Dynamics AX solutions from IBM Powerful, agile and simple enterprise resource planning 2 Microsoft Dynamics AX solutions from IBM Highlights Improve productivity

More information

Tagetik Extends Customer Value with SQL Server 2012

Tagetik Extends Customer Value with SQL Server 2012 Tagetik Extends Customer Value with SQL Server 2012 Author: Dave Kasabian Contributors: Marco Pierallini, Luca Pieretti Published: February 2012 Summary: As the 2011 Microsoft ISV Line of Business partner

More information

NetApp Syncsort Integrated Backup

NetApp Syncsort Integrated Backup WHITE PAPER NetApp Syncsort Integrated Backup Protect your Microsoft and VMware Environment with NetApp Syncsort Integrated Backup Protecting Microsoft and VMware Executive Summary 3 Microsoft and VMware

More information

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems [email protected] Big Data Invasion We hear so much on Big Data and

More information

Everything you need to know about flash storage performance

Everything you need to know about flash storage performance Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices

More information

Big data: Unlocking strategic dimensions

Big data: Unlocking strategic dimensions Big data: Unlocking strategic dimensions By Teresa de Onis and Lisa Waddell Dell Inc. New technologies help decision makers gain insights from all types of data from traditional databases to high-visibility

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

SSD Performance Tips: Avoid The Write Cliff

SSD Performance Tips: Avoid The Write Cliff ebook 100% KBs/sec 12% GBs Written SSD Performance Tips: Avoid The Write Cliff An Inexpensive and Highly Effective Method to Keep SSD Performance at 100% Through Content Locality Caching Share this ebook

More information

Table of Contents. Technical paper Open source comes of age for ERP customers

Table of Contents. Technical paper Open source comes of age for ERP customers Technical paper Open source comes of age for ERP customers It s no secret that open source software costs less to buy the software is free, in fact. But until recently, many enterprise datacenter managers

More information

Patrick Firouzian, ebay

Patrick Firouzian, ebay Informatica Data Integration Platform The Informatica Data Integration Platform is the industry s leading software for accessing, integrating, and delivering data from any source, to any source. The Informatica

More information