Example Use Cases. Solving the Need for Speed in Data Ops. Doc Version 2.1

Size: px
Start display at page:

Download "Example Use Cases. Solving the Need for Speed in Data Ops. Doc Version 2.1"

Transcription

1 Example Use Cases Solving the Need for Speed in Data Ops Doc Version 2.1

2 Table of Contents 1 Introduction to VelociData Solution Templates for Accelerating Data Ops Extending the Life of Corporate Infrastructure Identifying the Right Use Cases to Prove the Value Proven Use Case Examples Example 1: Complementary Data Integration Offload Data Transformations Typical process for adopting VelociData s solutions Example 2: Improving Data Quality Example 3: USPS Address Standardization Example 4: Data Cleansing and Validation Example 5: Data Platform Offload Example 6: Mainframe Offload & Acceleration Example 7: Combining Mainframe Offload with Data Quality Example 8: Streaming Sort Comparisons with Existing Solutions Example 9: Encryption and Data Masking Encryption Format-Preserving Encryption & Format-Preserving Masking Summary... 11

3 1 Introduction to VelociData VelociData is leading today s Big Data hardware-accelerated revolution to provide answers to challenging business questions in fractions of the time spent in standard compute systems and solve time-bound service level requirements. The need for speed in business decision-making requires real-time actionable intelligence from all relevant data sources. The need for speed in achieving time-bound operational processes in the face of increasing volumes, varieties and velocities of data requires transformative improvements in transformation performance. VelociData provides data transformation, data quality, data platform offload, and data sort solutions on an ultra-high performance appliance platform that enable real-time operational decisions and dramatic improvements in ETL processing performance. These unique cost/performance solutions are compatible with leading data integration tools and respond to global organizations growing demands while reducing total cost of ownership. 1.1 Solution Templates for Accelerating Data Ops This document offers a quick introduction to VelociData s technology by reviewing the best practices and solution templates that have proven to be successful within customer deployments. The use cases below have been selected to illustrate the various kinds of processing that can be greatly accelerated by VelociData. Although these selections are by no means complete, they illustrate how dramatically applications can be accelerated by offloading the toughest performance challenges to VelociData. Customers often see speed-ups of 10, 100, or even 1000 times that of traditional software-only approaches. It is easy and quick to demonstrate how the ground-breaking innovations can achieve great cost savings and improve data operations. 1.2 Extending the Life of Corporate Infrastructure VelociData s massively parallel but small footprint hardware appliances snap into existing data flows to perform line-rate processing. This processing can offload the most critical pain points in an infrastructure to ease the burden on existing servers. When the appliance shares the workload for an environment, the extra processing headroom returned to the offloaded platform helps to reduce the total cost of ownership against ever-increasing data loads. Today, every Fortune 1000 organization contends with the challenge of taming a flood of data to gather valuable business information. The VelociData ETL / Data Integration Appliances dissolve the most critical performance pain points to bring actionable business intelligence to real-time. VelociData is working with leading ETL / Data integration vendors to integrate these new solutions into existing engines to allow users to deploy accelerated solutions without making changes to their existing data integration architecture (e.g., Press Release: VelociData and Informatica Partner to Fulfill Customers' Requirements to Affordable Hyper-Scale/Hyper-Speed Big Data Analytics).

4 2 Identifying the Right Use Cases to Prove the Value Taking advantage of VelociData s superior technology is easy and fast. VelociData was designed to seamlessly integrate within existing reference architectures. The time required for initial installation, implementing, and configuration is measured in hours, not weeks. The technical staff does not need to undergo re-training for new skillsets. No new programming languages need to be learned. VelociData contributes their expertise in the form of collaborations, webinars, and other events to help educate the customer team on its technology and best practices for taking advantage of the appliance. This process can be very lean, lightweight, and quick. As an example, a customer reflecting back on the decision process said, "What's unique about VelociData is that you can prove the business and technical claims very quickly." Usually VelociData recommends an educational session, such as a Lunch & Learn or Coffee & Donuts, to introduce our solutions to a wider audience. This educational session is not a sales pitch, rather, it covers the nature of heterogeneous computing and why/how it has become a relevant innovation to enterprise computing. This has proven to spawn questions that help people connect the dots to possible internal use cases, as well as help in demystifying the effort involved. Once the broader technical team is energized, the proof of concept identification and testing become a natural and easy process for all involved. VelociData helps guide teams through the structured process of: 1. Identifying the bottlenecks that are causing the worst problems in the data flow. Our team s depth of experience in performance analysis across other well-known data transformation platforms allows us to quickly isolate areas in your work flow that can benefit from acceleration. In addition, VelociData has tools that can help to quickly identify existing bottlenecks by analyzing performance metrics in the metadata repository of existing ETL solutions. 2. Adapting workflows to utilize VelociData s acceleration at the most critical performance points. Typically this resides in-line with existing systems, causing only minimal changes to the process. 3. Planning a Proof of Concept that will demonstrate how well the new solution can accelerate data operations. Our team can help select key success criteria that demonstrate value to the business and quickly attain apples to apples benchmarks to showcase performance and ROI. At each step of the way, VelociData works with the team to ensure plan execution goes smoothly and quickly. VelociData can help with testing, training, performance analysis and other troubleshooting and walk you through the process of utilizing VelociData to efficiently and effectively scale your existing architecture for high volume data growth. This will breathe life into conventional architectures, protecting and extending the investments as data volumes increase.

5 3 Proven Use Case Examples 3.1 Example 1: Complementary Data Integration Offload The most common use case is to offload specific data integration transformations from existing data platforms. VelociData engineered solutions can extend the life of corporate infrastructure by offloading the most performance-hungry aspects of ETL; existing ETL infrastructures can remain intact. Our engineered solutions are capable of processing input streams ranging from unstructured text to structured record-oriented data at sufficient rates to completely saturate multiple 10 Gb/s lines. These accelerations can be applied to ETL, ELT, distributed, and MPP platforms Data Transformations The VelociData appliance is capable of performing line-rate data transformations and data enrichments. Since these tasks run at 10 Gb/s line rates, there is no measurable degradation while delivering improved data to the target. The following are some examples of transformation tasks that can be chained together with each other and any other cleansing and validation steps without slowing the data flow. Lookup & Replace: Enrich data by populating fields from a master file, or convert values from an old dictionary to a new one (for example, Product ID lookups, etc.) Type Conversions: Inter-convert data elements from binary to character, and convert character encodings between platforms Format Conversions: Convert data between XML, delimited (CSV), or fixed-width, and rearrange add or drop fields to change layouts Key Generation: Hash multiple field values into a unique pseudo-key using MD5 or SHA Data Masking: Obfuscate data for delivery to non-production environments using persistent or dynamic masking; Format-preserving encryption or format-preserving masking that leverages AES or SHA accordingly

6 3.1.2 Typical process for adopting VelociData s solutions To assess the best candidate opportunities to leverage VelociData accelerations for ETL, run times for individual transformation steps (stages) are reviewed. VelociData can help with this performance analysis by reporting on the operational stats within the ETL server s metadata repository. After looking at performance numbers, it is usually evident which transformations could be most improved with VelociData acceleration. In the example below, Lookup and replace, Field validation, Bounds Checking, and USPS Address Standardization appeared to be good candidates for acceleration for this customer. They appeared to be bottlenecks that were contributing the most to making the batch job lengthy. In other words, if these transformations could be speedup significantly, the overall batch job would run much faster. Based on these recommendations, VelociData could be installed in front, behind, or to the side of the ETL server in the workflow. In the following scenario where VelociData sits in front of the ETL server, speedups happen before the ETL server even begins its processing. This configuration is simple and practical because the data passes once through first VelociData and then the ETL server. No existing interfaces need to be changed other than commenting out the bottlenecked transformations in the ETL server.

7 After these changes were made (taking only two hours in this instance), performance of the ETL transformations improved by more than 1000 times. For the end-user it was very simple and quick to put into place. 3.2 Example 2: Improving Data Quality Another area where VelociData accelerations have shown great value is to improve data quality operations. VelociData typically looks for those operations that take the longest time or the most resources and see if there could be a faster, cleaner approach. Because VelociData uses a streaming data architecture, many quality checks can be done on-the-fly without slowing the work flow down. No time is taken to even store data on the appliance. 3.3 Example 3: USPS Address Standardization The VelociData engineered solution is capable of validating and standardizing USPS addresses at wire-speed while correcting bad or incomplete data. This solution can standardize over 10 billion addresses per hour, which is 200 times faster than a competitive system (as deployed on a 64-bit RedHat server with 8 processing cores and 16 GB of memory). This high speed solution can be integrated into structured data flows and can be coupled with other data validations and corrections for greater offload capabilities.

8 3.4 Example 4: Data Cleansing and Validation The VelociData solution offers a suite of data validation and correction engines that can be run on structured and semi-structured data. These validations offer such extreme performance they can be run on data in flight without slowing it down. The quality checks typically run at line rates even for 10 Gb network lines. Below are some examples: Standardization, verification and cleansing of USPS addresses Domain Set Validation, Null Checks Regular Expression Field Validation (validate format of addresses, SSNs, dates, etc.) An ETL workflow handling company-wide information for a Fortune 50 company initially ran overnight. This pushed the limits of the current solution and affected the SLAs of other workloads sharing the same environment. The use case involves validation and filtering of over 3 billion records. Through targeted offload of the slowest cleansing and validation tasks within this workflow, VelociData was able to reduce the overall runtime by an order of magnitude to just over 1 hour, creating new headroom for other applications in the shared environment and acting to future-proof the particular workflow against SLA infringement. 3.5 Example 5: Data Platform Offload Another solution template that has been very useful is to apply VelociData to offload other data platforms that have been tasked with transformations or processing for which they are just not designed. Mainframes are tremendous at transactional processing but not so good at some other tasks and algorithms more suited for other platforms. These platforms were just not built to do the types of transformations while VelociData s engineered system is purpose-built for these processes. Similarly, an MPP platform (e.g., Teradata, Netezza) is quite good at certain analytic workloads but when it comes to more mundane ELT, the platform is not as economical. Using a purpose-built solution for transformations with better price-performance can deliver substantial savings each & every year. In most platforms, where VelociData complements an existing solution, it does not replace the existing platform but adds great value. 3.6 Example 6: Mainframe Offload & Acceleration An IBM mainframe offload began with an EBCDIC to ASCII conversion where the data was destined for Hadoop as part of a large-scale analytics project. The customer gave VelociData their most complicated Cobol Copybook. In the POC, VelociData took an EBCDIC data set with 1300 variable-length fields in 30 different record types with >10,000 COBOL redefines and fully

9 unpacked/converted a 16 GB densely populated file (i.e., 9 million records) into Big Data-ready ASCII in less than a minute. When asked to demonstrate "the cost of change" by altering the input/output data formats, VelociData demonstrated how its product could be configured to handle the changes without any coding, in a matter of a few minutes. Offloading this effort from the customer's IBM mainframe saved nearly $200,000 per month, but the real strategic value will be gained by providing wire-speed, high volume analytics-ready data into Hadoop in support of a new money making service being introduced. Solving one begets the other. 3.7 Example 7: Combining Mainframe Offload with Data Quality In response to a customer request, VelociData has also offloaded and accelerated the processing of a custom mainframe record format. Each record of this format included a header section followed by thousands of key / value encoded data elements. For each of these records, VelociData parsed the key-value encoded section converting each element from various mainframe formats into printable ASCII that could be easily parsed by a downstream Hadoop system. This solution not only converts these elements to ASCII, but also filters out any key and value containing invalid or unset characters, and it trims extra whitespace from each element to compact the output file for faster downstream parsing. This solution ran at over 600 MB/sec, processing a 1.2 GB daily file in just 2 seconds. 3.8 Example 8: Streaming Sort Perform streaming data sort with VelociData at a million records per second on large datasets even billions of records without slowing down the rest of your processing. Accelerated sorting can be used to accelerate a variety of other applications, such as deduplication, mainframe sorts, JOINs, merges, indexing, aggregations, and Map/Reduce (Big Data / Hadoop) Comparisons with Existing Solutions A customer dataset of 3 million rows each populated with 100 fields (800 Bytes total) and keys from three fields was accelerated by VelociData SORT. This data ran ten times faster than a leading industry application on an 8-core 64 GB system. A large insurance company found that performance was at least 20 times faster than their existing solution on files having over 500 million rows.

10 3.9 Example 9: Encryption and Data Masking As one final pattern of where an engineered solution can improve data operations, consider today s security and compliance challenges. Regulatory compliance requirements and increased security needs have opened up new avenues for corporations to find innovative, economical ways to meet their goals. VelociData s approach to encryption and data masking is through a streaming data architecture. No data is stored on hard drives and no plaintext data resides in regular memory. What if you could mask or encrypt your data anywhere at wire speeds? Encryption The VelociData appliance delivers the ability to encrypt and decrypt data faster than it can be transferred through a network. VelociData has demonstrated the ability to secure data with strong encryption (up to 256 bit AES) at over 2 GB per second, which can more than saturate two 10 Gb/s channels. This security can be offered for a full data stream or for selected fields within structured data. VelociData can also deliver key rotations and key changes without slowing the data processing flow Format-Preserving Encryption & Format-Preserving Masking VelociData offers this high-speed capability in an extremely valuable format-preserving mode. This encryption mode, which conforms to the NIST G specification, allows users to encrypt (reversibly) or mask (irreversibly) data without changing its field specification. It is applicable for local targets or for a private or public cloud. A data set containing 10 million records with ten sensitive fields in each record can be masked or secured in seconds compared with a day using conventional approaches. One of the most desirable features of format-preservation is that databases do not require schema changes in order to store a column and analytics tools function normally. Masking differs from encryption in that masking cannot ever be reversed back to the original plaintext values. Masking is used to generate non-production datasets, such as for QA/testing or to move the dataset to the cloud. VelociData was used in a Format-Preserving Data Masking use case for the purpose of improving QA testing data samples. In a recent proof of concept, VelociData was given a densely populated 100 field record layout and directed to mask 25 specific fields in every input record. VelociData was able to demonstrate these transformations on 100,000 records in one second which is the equivalent to masking 2.5 million fields per second.

11 4 Summary VelociData can be used in a variety of ways to save corporations time, money, and reduce complexity. VelociData solutions can snap into a reference architecture in a variety of ways to offload and accelerate existing processes. When hardware acceleration is applied to data processing challenges, the results can be dramatic. Processes that run in hours or days can be completed in minutes or seconds. Costly hardware additions or upgrades can be significantly reduced, delayed or in some cases, negated entirely. VelociData provides not only the expertise to identify use cases but we also provide the implementation and integration expertise to demonstrate value quickly. VelociData proof of concept engagements typically demonstrate multiple use cases within days of entering the data center. Put VelociData to the challenge, we ll deliver an appliance, and once installed, PoC use cases are typically running and showing results in hours, not days. VelociData is a fixed price model. A monthly subscription fee covers as much use as you can throw at the appliance. The fee includes all maintenance, support and enhancements. VelociData s claims are easy to prove out. Typically, once we are installed, we can show results in a couple of hours. Please contact us at info@velocidata.com.

VelociData Solving the Need for Speed in DataOps. Inside Analysis / Bloor Group Briefing June 13, 2014

VelociData Solving the Need for Speed in DataOps. Inside Analysis / Bloor Group Briefing June 13, 2014 VelociData Solving the Need for Speed in DataOps Inside Analysis / Bloor Group Briefing June 13, 2014 1 Transforming Speed and Economics of Data Operations to Achieve Time-Bound Service Levels, Gain Wire-Speed

More information

Providing Secure Representative Data Sets

Providing Secure Representative Data Sets Test Data Protection Providing Secure Representative Data Sets By Dr. Ron Indeck VelociData Inc. - www.velocidata.com World Headquarters 321 North Clark Street, Suite 740 Chicago, IL 60654 Telephone: 312-600-4422

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

White Paper. Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using

More information

Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta

More information

Composite Software Data Virtualization Five Steps to More Effective Data Governance

Composite Software Data Virtualization Five Steps to More Effective Data Governance Composite Software Data Virtualization Five Steps to More Effective Data Governance Composite Software, Inc. August 2011 TABLE OF CONTENTS EVERYBODY LIKES DATA GOVERNANCE... 3 FIVE REQUIREMENTS FOR MORE

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

High performance ETL Benchmark

High performance ETL Benchmark High performance ETL Benchmark Author: Dhananjay Patil Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 07/02/04 Email: erg@evaltech.com Abstract: The IBM server iseries

More information

High-Volume Data Warehousing in Centerprise. Product Datasheet

High-Volume Data Warehousing in Centerprise. Product Datasheet High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Informatica Data Quality Product Family

Informatica Data Quality Product Family Brochure Informatica Product Family Deliver the Right Capabilities at the Right Time to the Right Users Benefits Reduce risks by identifying, resolving, and preventing costly data problems Enhance IT productivity

More information

DMX-h ETL Use Case Accelerator. Word Count

DMX-h ETL Use Case Accelerator. Word Count DMX-h ETL Use Case Accelerator Word Count Syncsort Incorporated, 2015 All rights reserved. This document contains proprietary and confidential material, and is only for use by licensees of DMExpress. This

More information

Mission-Driven Big Data

Mission-Driven Big Data Mission-Driven Big Data Tim Brooks Jamie Milne Principal Engagement Manager Copyright 2014 World Wide Technology, Inc. All rights reserved. Experience Across Big Data Deliverables PUBLIC SECTOR COMMERCIAL

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

SAP Data Services 4.X. An Enterprise Information management Solution

SAP Data Services 4.X. An Enterprise Information management Solution SAP Data Services 4.X An Enterprise Information management Solution Table of Contents I. SAP Data Services 4.X... 3 Highlights Training Objectives Audience Pre Requisites Keys to Success Certification

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Reducing Backups with Data Deduplication

Reducing Backups with Data Deduplication The Essentials Series: New Techniques for Creating Better Backups Reducing Backups with Data Deduplication sponsored by by Eric Beehler Reducing Backups with Data Deduplication... 1 Explaining Data Deduplication...

More information

EMC BACKUP MEETS BIG DATA

EMC BACKUP MEETS BIG DATA EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for

More information

Decision Ready Data: Power Your Analytics with Great Data. Murthy Mathiprakasam

Decision Ready Data: Power Your Analytics with Great Data. Murthy Mathiprakasam Decision Ready Data: Power Your Analytics with Great Data Murthy Mathiprakasam 2 Your Mission Repeatably deliver trusted and timely data for great analytics and great social impact 3 Great Data Powers

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Customer Insight Appliance. Enabling retailers to understand and serve their customer

Customer Insight Appliance. Enabling retailers to understand and serve their customer Customer Insight Appliance Enabling retailers to understand and serve their customer Customer Insight Appliance Enabling retailers to understand and serve their customer. Technology has empowered today

More information

DMX-h ETL Use Case Accelerator. Web Log Aggregation

DMX-h ETL Use Case Accelerator. Web Log Aggregation DMX-h ETL Use Case Accelerator Web Log Aggregation Syncsort Incorporated, 2015 All rights reserved. This document contains proprietary and confidential material, and is only for use by licensees of DMExpress.

More information

Integrating data in the Information System An Open Source approach

Integrating data in the Information System An Open Source approach WHITE PAPER Integrating data in the Information System An Open Source approach Table of Contents Most IT Deployments Require Integration... 3 Scenario 1: Data Migration... 4 Scenario 2: e-business Application

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

The Data Warehouse ETL Toolkit

The Data Warehouse ETL Toolkit 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,

More information

dbspeak DBs peak when we speak

dbspeak DBs peak when we speak Data Profiling: A Practitioner s approach using Dataflux [Data profiling] employs analytic methods for looking at data for the purpose of developing a thorough understanding of the content, structure,

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Big Data Success Step 1: Get the Technology Right

Big Data Success Step 1: Get the Technology Right Big Data Success Step 1: Get the Technology Right TOM MATIJEVIC Director, Business Development ANDY MCNALIS Director, Data Management & Integration MetaScale is a subsidiary of Sears Holdings Corporation

More information

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access

More information

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics Paper 1828-2014 Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics John Cunningham, Teradata Corporation, Danville, CA ABSTRACT SAS High Performance Analytics (HPA) is a

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

Speeding ETL Processing in Data Warehouses White Paper

Speeding ETL Processing in Data Warehouses White Paper Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

All-Flash Arrays: Not Just for the Top Tier Anymore

All-Flash Arrays: Not Just for the Top Tier Anymore All-Flash Arrays: Not Just for the Top Tier Anymore Falling prices, new technology make allflash arrays a fit for more financial, life sciences and healthcare applications EXECUTIVE SUMMARY Real-time financial

More information

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter

More information

Everything you need to know about flash storage performance

Everything you need to know about flash storage performance Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

WHITE PAPER Making Cloud an Integral Part of Your Enterprise Storage and Data Protection Strategy

WHITE PAPER Making Cloud an Integral Part of Your Enterprise Storage and Data Protection Strategy WHITE PAPER Making Cloud an Integral Part of Your Enterprise Storage and Data Protection Strategy Sponsored by: Riverbed Technology Brad Nisbet December 2010 Richard L. Villars Global Headquarters: 5 Speen

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform: Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.

More information

Data Virtualization A Potential Antidote for Big Data Growing Pains

Data Virtualization A Potential Antidote for Big Data Growing Pains perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and

More information

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 Hadoop's Role in the Big Data Challenge 3 Cloudera:

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Integrating Ingres in the Information System: An Open Source Approach

Integrating Ingres in the Information System: An Open Source Approach Integrating Ingres in the Information System: WHITE PAPER Table of Contents Ingres, a Business Open Source Database that needs Integration... 3 Scenario 1: Data Migration... 4 Scenario 2: e-business Application

More information

Integrating Netezza into your existing IT landscape

Integrating Netezza into your existing IT landscape Marco Lehmann Technical Sales Professional Integrating Netezza into your existing IT landscape 2011 IBM Corporation Agenda How to integrate your existing data into Netezza appliance? 4 Steps for creating

More information

Informatica and the Vibe Virtual Data Machine

Informatica and the Vibe Virtual Data Machine White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information

More information

Cloud Storage Backup for Storage as a Service with AT&T

Cloud Storage Backup for Storage as a Service with AT&T WHITE PAPER: CLOUD STORAGE BACKUP FOR STORAGE AS A SERVICE........ WITH..... AT&T........................... Cloud Storage Backup for Storage as a Service with AT&T Who should read this paper Customers,

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition 1 What s New with Informatica Data Services & PowerCenter Data Virtualization Edition Kevin Brady, Integration Team Lead Bonneville Power Wei Zheng, Product Management Informatica Ash Parikh, Product Marketing

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

How To Choose A Data Flow Pipeline From A Data Processing Platform

How To Choose A Data Flow Pipeline From A Data Processing Platform S N A P L O G I C T E C H N O L O G Y B R I E F SNAPLOGIC BIG DATA INTEGRATION PROCESSING PLATFORMS 2 W Fifth Avenue Fourth Floor, San Mateo CA, 94402 telephone: 888.494.1570 www.snaplogic.com Big Data

More information

IBM Software Integrating and governing big data

IBM Software Integrating and governing big data IBM Software big data Does big data spell big trouble for integration? Not if you follow these best practices 1 2 3 4 5 Introduction Integration and governance requirements Best practices: Integrating

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

Redefining Infrastructure Management for Today s Application Economy

Redefining Infrastructure Management for Today s Application Economy WHITE PAPER APRIL 2015 Redefining Infrastructure Management for Today s Application Economy Boost Operational Agility by Gaining a Holistic View of the Data Center, Cloud, Systems, Networks and Capacity

More information

Base One's Rich Client Architecture

Base One's Rich Client Architecture Base One's Rich Client Architecture Base One provides a unique approach for developing Internet-enabled applications, combining both efficiency and ease of programming through its "Rich Client" architecture.

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

DIGGING DEEPER: What Really Matters in Data Integration Evaluations?

DIGGING DEEPER: What Really Matters in Data Integration Evaluations? DIGGING DEEPER: What Really Matters in Data Integration Evaluations? It s no surprise that when customers begin the daunting task of comparing data integration products, the similarities seem to outweigh

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

SafeNet DataSecure vs. Native Oracle Encryption

SafeNet DataSecure vs. Native Oracle Encryption SafeNet vs. Native Encryption Executive Summary Given the vital records databases hold, these systems often represent one of the most critical areas of exposure for an enterprise. Consequently, as enterprises

More information

EAI vs. ETL: Drawing Boundaries for Data Integration

EAI vs. ETL: Drawing Boundaries for Data Integration A P P L I C A T I O N S A W h i t e P a p e r S e r i e s EAI and ETL technology have strengths and weaknesses alike. There are clear boundaries around the types of application integration projects most

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast International Conference on Civil, Transportation and Environment (ICCTE 2016) Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast Xiaodong Zhang1, a, Baotian Dong1, b, Weijia Zhang2,

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform Optimized for the Industrial Internet: GE s Industrial Lake Platform Agenda The Opportunity The Solution The Challenges The Results Solutions for Industrial Internet, deep domain expertise 2 GESoftware.com

More information

Top 10 Reasons Enterprises are Moving Security to the Cloud

Top 10 Reasons Enterprises are Moving Security to the Cloud ZSCALER EBOOK Top 10 Reasons Enterprises are Moving Security to the Cloud A better approach to security Albert Einstein defined insanity as doing the same thing over and over again and expecting different

More information

Infrastructure that enables innovation

Infrastructure that enables innovation Infrastructure that enables innovation As manufacturers, we understand the importance of economical, fast and scalable management solutions; ones that help you manage and integrate data from several different

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Big Data and Natural Language: Extracting Insight From Text

Big Data and Natural Language: Extracting Insight From Text An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5

More information