SOLUTION BRIEF. JUST THE FAQs: Moving Big Data with Bulk Load. www.datadirect.com

Similar documents
SOLUTION BRIEF. Advanced ODBC and JDBC Access to Salesforce Data.

Failover Support. DataDirect Connect for ODBC Drivers. Introduction. Connection Failover

Increasing Driver Performance

Real-time Data Replication

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

BUSINESS INTELLIGENCE ANALYTICS

SharePlex for SQL Server

Attunity Integration Suite

Using DataDirect Connect for JDBC with Oracle Real Application Clusters (RAC)

Using Oracle Real Application Clusters (RAC)

Oracle Data Integration: CON7926 Oracle Data Integration: A Crucial Ingredient for Cloud Integration

Oracle Architecture, Concepts & Facilities

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING

SQL Server. SQL Server 100 Most Asked Questions: Best Practices guide to managing, mining, building and developing SQL Server databases

Informatica Data Replication: Maximize Return on Data in Real Time Chai Pydimukkala Principal Product Manager Informatica

SQL Server What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

SequeLink Server for ODBC Socket

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Why Big Data in the Cloud?

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, Integration Guide IBM

Informatica Data Replication FAQs

Enable BI, Reporting, and ETL Integration with Your App

Microsoft SQL Server Installation Guide

Achieving Database Interoperability Across Data Access APIs through SQL Up-leveling

Jitterbit Technical Overview : Microsoft Dynamics CRM

Benchmarks of SQL Query Performance for ODBC and Oracle Call Interface

Microsoft SQL Server Installation Guide

Designing a Data Solution with Microsoft SQL Server 2014

DataDirect XQuery Technical Overview

Tushar Joshi Turtle Networks Ltd

Microsoft SQL Database Administrator Certification

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, Integration Guide IBM

SQL Access to OpenEdge Apps

Why developers should use ODBC instead of native proprietary database interfaces

Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465

Oracle Warehouse Builder 10g

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

Innovative technology for big data analytics

Efficient and Real Time Data Integration With Change Data Capture

Managing Data in Motion

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

Microsoft SQL Server 2012: What to Expect

Microsoft SQL Server versus IBM DB2 Comparison Document (ver 1) A detailed Technical Comparison between Microsoft SQL Server and IBM DB2

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SQL Server Business Intelligence on HP ProLiant DL785 Server

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Jitterbit Technical Overview : Salesforce

Below are the some of the new features of SQL Server that has been discussed in this course

The Methodology Behind the Dell SQL Server Advisor Tool

A McKnight Associates, Inc. White Paper: Effective Data Warehouse Organizational Roles and Responsibilities

Is ETL Becoming Obsolete?

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Accessing Oracle 11g from SAS on Linux Using DataDirect Connect for ODBC

Your Data, Any Place, Any Time.

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Designing a Data Solution with Microsoft SQL Server 2014

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

Achieving Zero Downtime and Accelerating Performance for WordPress

Data Sheet: Disaster Recovery Veritas Volume Replicator by Symantec Data replication for disaster recovery

SQL Server Integration Services with Oracle Database 10g

Databricks. A Primer

Designing a Data Solution with Microsoft SQL Server

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

HyperQ DR Replication White Paper. The Easy Way to Protect Your Data

EMBL-EBI. Database Replication - Distribution

Oracle SQL Developer Migration. An Oracle White Paper September 2008

Amazon Relational Database Service (RDS)

Oracle Database 12c Plug In. Switch On. Get SMART.

MS 20465C: Designing a Data Solution with Microsoft SQL Server

ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION

Technical. Overview. ~ a ~ irods version 4.x

High-Performance Oracle: Proven Methods for Achieving Optimum Performance and Availability

Course 20465: Designing a Data Solution with Microsoft SQL Server

Designing a Data Solution with Microsoft SQL Server

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Project management integrated into Outlook

CERULIUM TERADATA COURSE CATALOG

ENABLING OPERATIONAL BI

Course 20465C: Designing a Data Solution with Microsoft SQL Server

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Designing Database Solutions for Microsoft SQL Server 2012

Database as a Service (DaaS) Version 1.02

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Big Data Analytics - Accelerated. stream-horizon.com

Powerful Management of Financial Big Data

Transcription:

SOLUTION BRIEF JUST THE FAQs: Moving Big Data with Bulk Load

2 INTRODUCTION As the data and information used by businesses grow exponentially, IT organizations face a daunting challenge moving what is now termed in the enterprise as Big Data. What is the quickest and most efficient way to move, extract, backup, archive and access mountains of critical information paramount to the success of the business. If your business has Big Data that needs to be moved into and out of a variety of s, you may be stuck with an underperforming approach and don t even know it. Let us introduce you to the wonders of Bulk Load. PROGRESS DATADIRECT Progress DataDirect Connect ODBC, JDBC, and ADO.NET drivers include an advanced Bulk Load feature for inserting very large numbers of records into a as quickly as possible. Progress DataDirect drivers: Deliver the most reliable bulk load execution and best performance Require no application code changes or vendor tools Employ standards-based APIs across multiple s and platforms With Progress DataDirect Bulk Load, enterprise organizations effectively satisfy the bulk data access requirements for a broad array of data access use cases. In doing so, they simplify the data access architecture; save important resources for other tasks; and improve operational performance. Progress DataDirect Bulk Load delivers the fastest performance for inserting mass amounts of data into a. Progress DataDirect has conducted a range of performance trials including comparing our own drivers against themselves when Bulk Load is enabled vs. disabled. Also, we compared performing a bulk load from an external file into a against the same operation with a competitor s bulk load tool. Enabling Bulk Load in the Progress DataDirect Oracle ODBC Wire Protocol driver results in the driver inserting over 105% more rows twice as many over a given time period. And despite tuning the competitor s tool for maximum performance, DataDirect Bulk Load enables the DataDirect ODBC Oracle Wire Protocol driver to insert over 20% more rows than the competitor s tool. With Bulk Load enabled, DataDirect s Type 5 JDBC driver delivers much more throughput, resulting in over 105% more rows twice as many over a given time period. And the time required to execute a batch cycle inserting 10 million rows can be cut by more than half going from 6.3 hours to less than 3 hours. So now do you know everything there is to know about Bulk Load and how it can help your organization? Probably not, so here are some FAQs to help you get there.

3 FREQUENTLY ASKED QUESTIONS The efficiency and performance of Bulk Load data transfers are compelling. Should I switch all of my applications to use this methodology moving forward? No. Bulk data transfer has very specific use cases as it causes the to behave in atypical ways not expected by applications that use it. For instance, when bulk is turned on, a removes the integrity constraints on the data, thus leaving your application open to polluting your data. The advantage is you can get data into the data source very, very quickly; but because of potential data integrity issues, it is not a feature that should be blindly switched on. What are some possible use cases for loading data via Bulk Load? ENTERPRISE SCENARIO Data Warehousing loading bulk data files into a data warehouse Data Migration moving or copying data in tables from one to another Data Replication taking bulk data files from a server or location and loading them into a Disaster Recovery moving data into a backup, disaster recovery, or failover Cloud Data Publication loading bulk data files or tables into a cloud-based PROGRESS DATADIRECT BULK LOAD CAPABILITIES Results prove that Progress DataDirect ODBC, JDBC, and ADO.NET Bulk Load delivers the fastest, best performance for loading bulk data into an Oracle, DB2, Sybase, or SQL Server-based data warehouse while avoiding data latency issues. Progress DataDirect Bulk Load is ideal for simple extract and load data migration operations, moving bulk data from one directly into the other by streaming, thus avoiding the need to load the data into memory. Instead of using FTP or similar approaches for pushing files around a network, Progress DataDirect Bulk Load quickly loads the data you need into relational tables. This approach is faster and provides the added benefit of storing the data as a relational table easily accessed by reporting or BI applications. Disaster recovery is all about making sure that when a failure occurs, the backup you are working with is as close to the original set of data as possible. Progress DataDirect Bulk Load ensures that any bulk data is quickly and easily replicated into disaster recovery s. In cloud-based computing, efficient network usage is critical. As a result, performance is ever-important when moving bulk data files or tables into a cloudbased. Progress DataDirect Bulk Load allows developers to quickly and easily build a simple program that publishes bulk data into the cloud.

4 How can I differentiate between normal and bulk data loads? Bulk Load allows your move large amounts of data between two software tiers, very efficiently. It utilizes a specialized protocol that streamlines the data directly into the target data source for maximum efficiency. Is it possible to same-tier bulk data loads? In this sense, bulk data loading between two data sources is absolutely possible. Applications can author queries to fetch the data they want. And then using the appropriate API calls in ODBC, JDBC or ADO.NET, the applications redirect the result of that query directly into the target bulk essentially streaming the data from one to another, all without realizing the data on the client. In effect, the application can act as a pipe for data movement with the plumbing defined by the data source query and the target data load. What are the limitations of typical bulk data transfers? What does Progress DataDirect offer beyond the current tools? Outside of bulk data loads unsuitable for broad-based applications, bulk data loads typically fail when used with data types such as CLOBs and Blobs or data types used to store significant amount of data such as images or large text files. With -distributed bulk load tools, the bulk load process will fail if these types are encountered. Progress DataDirect Bulk Load compensates for types such as CLOBs and BLOBs and allows the load to continue utilizing non-bulk protocols. Why would I choose to use a driver for bulk data transfer? Drivers offer far greater flexibility, and more importantly, functional consistency than individual bulk load utilities, which offer highly-variable functionality and unpredictable performance throughput. With driver-based bulk loads, application developers can leverage familiar interfaces and bulk load-specific programmatic interfaces to tightly couple their bulk load semantics into the applications or platforms. Does bulk data transfer mean ETL? Progress Data Direct plays an important role in the ETL process; however it should not be confused with an ETL replacement. On the extract phase of ETL, Progress DataDirect is highly effective in retrieving data into a platform or application as well as delivering additional data quality, master data management or transformation. On the load phase of ETL, Progress DataDirect bulk load can play a significant role in delivering the processed data into persistent data storage such as a. Is it possible to consume, and leverage vendor bulk files with DataDirect bulk load speed and functionality? No, not currently. However, the Progress DataDirect team is actively considering implementing support for this in future versions of DataDirect Bulk Load.

5 How about replays? Bulk data movement typically involves moving a significant amount of data. If I encounter a failure, is it possible for me to continue somewhere in the bulk load data transfer once the error is corrected? Yes, with bulk load logging, Progress DataDirect can record the precise location where the bulk load failed, if using the proprietary CSV bulk file representation supported by DataDirect Bulk Load. By setting a simple configuration option, in the bulk load governance file, a log file is generated during the bulk load. The timestamp recorded at bulk load failure contains associated row number, which can be used as the future starting point for resuming the bulk load process. With Bulk Export, how does Progress DataDirect represent the data? We generate a CSV file with the contents of the bulk export operations. While each supported API (ODBC, JDBC and ADO.NET) offers specific means to the bulk export mechanism, the format and governing configuration file it produces is singularly consumable by any of the bulk load (import) implementations. With this consistency, applications can effectively move data between disparate applications and platforms with the underlying guarantee of round trip integrity checking. You ve mentioned a bulk configuration file. Do I need to generate one for myself, or is a default configuration file generated on my behalf? During a bulk export or bulk load, a configuration file is generated to support the resultant bulk data file or bulk import. The file describes the actual data in the bulk data file so that it is fully transportable across the full breath of platforms and software tiers support by DataDirect Bulk Load. Some key features include the ability to define character set conversion to ensure data integrity when moving data across platforms, and a common set of data types so all tiers can correctly compensate and understand the data in the bulk file. Is it necessary to use the bulk file when pipelining bulk export and bulk load operations? No! Pipelining DataDirect bulk export and bulk load is one of the most efficient approaches for data movement available today. Application developers can code queries to source the data and trigger the bulk export they require using the API of their choice. Using a combination of proprietary or standard bulk load APIs, the result set, can be streamed directly into target sources, without elucidating the data on the client tier. SUMMARY Bulk Load is just one of many features that make our connectivity products the industry standard. In this era of Big Data can you afford to continue with business as usual. If you have additional questions about Bulk Load or about data connectivity, please contact us at (800) 876-3101 or visit. Ready to get started? Progress DataDirect offers a free, fully functional, 15-day trial on all products. /download