SOLUTION BRIEF JUST THE FAQs: Moving Big Data with Bulk Load
2 INTRODUCTION As the data and information used by businesses grow exponentially, IT organizations face a daunting challenge moving what is now termed in the enterprise as Big Data. What is the quickest and most efficient way to move, extract, backup, archive and access mountains of critical information paramount to the success of the business. If your business has Big Data that needs to be moved into and out of a variety of s, you may be stuck with an underperforming approach and don t even know it. Let us introduce you to the wonders of Bulk Load. PROGRESS DATADIRECT Progress DataDirect Connect ODBC, JDBC, and ADO.NET drivers include an advanced Bulk Load feature for inserting very large numbers of records into a as quickly as possible. Progress DataDirect drivers: Deliver the most reliable bulk load execution and best performance Require no application code changes or vendor tools Employ standards-based APIs across multiple s and platforms With Progress DataDirect Bulk Load, enterprise organizations effectively satisfy the bulk data access requirements for a broad array of data access use cases. In doing so, they simplify the data access architecture; save important resources for other tasks; and improve operational performance. Progress DataDirect Bulk Load delivers the fastest performance for inserting mass amounts of data into a. Progress DataDirect has conducted a range of performance trials including comparing our own drivers against themselves when Bulk Load is enabled vs. disabled. Also, we compared performing a bulk load from an external file into a against the same operation with a competitor s bulk load tool. Enabling Bulk Load in the Progress DataDirect Oracle ODBC Wire Protocol driver results in the driver inserting over 105% more rows twice as many over a given time period. And despite tuning the competitor s tool for maximum performance, DataDirect Bulk Load enables the DataDirect ODBC Oracle Wire Protocol driver to insert over 20% more rows than the competitor s tool. With Bulk Load enabled, DataDirect s Type 5 JDBC driver delivers much more throughput, resulting in over 105% more rows twice as many over a given time period. And the time required to execute a batch cycle inserting 10 million rows can be cut by more than half going from 6.3 hours to less than 3 hours. So now do you know everything there is to know about Bulk Load and how it can help your organization? Probably not, so here are some FAQs to help you get there.
3 FREQUENTLY ASKED QUESTIONS The efficiency and performance of Bulk Load data transfers are compelling. Should I switch all of my applications to use this methodology moving forward? No. Bulk data transfer has very specific use cases as it causes the to behave in atypical ways not expected by applications that use it. For instance, when bulk is turned on, a removes the integrity constraints on the data, thus leaving your application open to polluting your data. The advantage is you can get data into the data source very, very quickly; but because of potential data integrity issues, it is not a feature that should be blindly switched on. What are some possible use cases for loading data via Bulk Load? ENTERPRISE SCENARIO Data Warehousing loading bulk data files into a data warehouse Data Migration moving or copying data in tables from one to another Data Replication taking bulk data files from a server or location and loading them into a Disaster Recovery moving data into a backup, disaster recovery, or failover Cloud Data Publication loading bulk data files or tables into a cloud-based PROGRESS DATADIRECT BULK LOAD CAPABILITIES Results prove that Progress DataDirect ODBC, JDBC, and ADO.NET Bulk Load delivers the fastest, best performance for loading bulk data into an Oracle, DB2, Sybase, or SQL Server-based data warehouse while avoiding data latency issues. Progress DataDirect Bulk Load is ideal for simple extract and load data migration operations, moving bulk data from one directly into the other by streaming, thus avoiding the need to load the data into memory. Instead of using FTP or similar approaches for pushing files around a network, Progress DataDirect Bulk Load quickly loads the data you need into relational tables. This approach is faster and provides the added benefit of storing the data as a relational table easily accessed by reporting or BI applications. Disaster recovery is all about making sure that when a failure occurs, the backup you are working with is as close to the original set of data as possible. Progress DataDirect Bulk Load ensures that any bulk data is quickly and easily replicated into disaster recovery s. In cloud-based computing, efficient network usage is critical. As a result, performance is ever-important when moving bulk data files or tables into a cloudbased. Progress DataDirect Bulk Load allows developers to quickly and easily build a simple program that publishes bulk data into the cloud.
4 How can I differentiate between normal and bulk data loads? Bulk Load allows your move large amounts of data between two software tiers, very efficiently. It utilizes a specialized protocol that streamlines the data directly into the target data source for maximum efficiency. Is it possible to same-tier bulk data loads? In this sense, bulk data loading between two data sources is absolutely possible. Applications can author queries to fetch the data they want. And then using the appropriate API calls in ODBC, JDBC or ADO.NET, the applications redirect the result of that query directly into the target bulk essentially streaming the data from one to another, all without realizing the data on the client. In effect, the application can act as a pipe for data movement with the plumbing defined by the data source query and the target data load. What are the limitations of typical bulk data transfers? What does Progress DataDirect offer beyond the current tools? Outside of bulk data loads unsuitable for broad-based applications, bulk data loads typically fail when used with data types such as CLOBs and Blobs or data types used to store significant amount of data such as images or large text files. With -distributed bulk load tools, the bulk load process will fail if these types are encountered. Progress DataDirect Bulk Load compensates for types such as CLOBs and BLOBs and allows the load to continue utilizing non-bulk protocols. Why would I choose to use a driver for bulk data transfer? Drivers offer far greater flexibility, and more importantly, functional consistency than individual bulk load utilities, which offer highly-variable functionality and unpredictable performance throughput. With driver-based bulk loads, application developers can leverage familiar interfaces and bulk load-specific programmatic interfaces to tightly couple their bulk load semantics into the applications or platforms. Does bulk data transfer mean ETL? Progress Data Direct plays an important role in the ETL process; however it should not be confused with an ETL replacement. On the extract phase of ETL, Progress DataDirect is highly effective in retrieving data into a platform or application as well as delivering additional data quality, master data management or transformation. On the load phase of ETL, Progress DataDirect bulk load can play a significant role in delivering the processed data into persistent data storage such as a. Is it possible to consume, and leverage vendor bulk files with DataDirect bulk load speed and functionality? No, not currently. However, the Progress DataDirect team is actively considering implementing support for this in future versions of DataDirect Bulk Load.
5 How about replays? Bulk data movement typically involves moving a significant amount of data. If I encounter a failure, is it possible for me to continue somewhere in the bulk load data transfer once the error is corrected? Yes, with bulk load logging, Progress DataDirect can record the precise location where the bulk load failed, if using the proprietary CSV bulk file representation supported by DataDirect Bulk Load. By setting a simple configuration option, in the bulk load governance file, a log file is generated during the bulk load. The timestamp recorded at bulk load failure contains associated row number, which can be used as the future starting point for resuming the bulk load process. With Bulk Export, how does Progress DataDirect represent the data? We generate a CSV file with the contents of the bulk export operations. While each supported API (ODBC, JDBC and ADO.NET) offers specific means to the bulk export mechanism, the format and governing configuration file it produces is singularly consumable by any of the bulk load (import) implementations. With this consistency, applications can effectively move data between disparate applications and platforms with the underlying guarantee of round trip integrity checking. You ve mentioned a bulk configuration file. Do I need to generate one for myself, or is a default configuration file generated on my behalf? During a bulk export or bulk load, a configuration file is generated to support the resultant bulk data file or bulk import. The file describes the actual data in the bulk data file so that it is fully transportable across the full breath of platforms and software tiers support by DataDirect Bulk Load. Some key features include the ability to define character set conversion to ensure data integrity when moving data across platforms, and a common set of data types so all tiers can correctly compensate and understand the data in the bulk file. Is it necessary to use the bulk file when pipelining bulk export and bulk load operations? No! Pipelining DataDirect bulk export and bulk load is one of the most efficient approaches for data movement available today. Application developers can code queries to source the data and trigger the bulk export they require using the API of their choice. Using a combination of proprietary or standard bulk load APIs, the result set, can be streamed directly into target sources, without elucidating the data on the client tier. SUMMARY Bulk Load is just one of many features that make our connectivity products the industry standard. In this era of Big Data can you afford to continue with business as usual. If you have additional questions about Bulk Load or about data connectivity, please contact us at (800) 876-3101 or visit. Ready to get started? Progress DataDirect offers a free, fully functional, 15-day trial on all products. /download