High-Volume Data Warehousing in Centerprise. Product Datasheet



Similar documents
Centerprise Data Integrator

SAP Data Services 4.X. An Enterprise Information management Solution

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

This document describes the Dynamics CRM integration functionality in Centerprise Data Integrator and discusses

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Overview. Datasheet: Centerprise Connector for Salesforce. Key Features. Overview

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Datasheet: Centerprise Connector for Salesforce Key Features

Course Outline. Module 1: Introduction to Data Warehousing

BUSINESSOBJECTS DATA INTEGRATOR

East Asia Network Sdn Bhd

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Evaluation Checklist Data Warehouse Automation

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012

BUSINESSOBJECTS DATA INTEGRATOR

Implementing a Data Warehouse with Microsoft SQL Server 2012

SQL Server 2005 Features Comparison

Oracle Data Integrator Technical Overview. An Oracle White Paper Updated December 2006

2014 Astera Software. Convergence of Data and Application Integration

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

The Data Warehouse ETL Toolkit

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

SQL Server 2012 Business Intelligence Boot Camp

The Evolution of ETL

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution.

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Integrating Ingres in the Information System: An Open Source Approach

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Data Integration and ETL with Oracle Warehouse Builder: Part 1

Oracle Warehouse Builder 10g

Integrating data in the Information System An Open Source approach

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Analance Data Integration Technical Whitepaper

Chapter 5. Learning Objectives. DW Development and ETL

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

EDI Overview 3. EDIConnect Benefits 3. EDIConnect - A Complete Solution 4. Key Technologies 5. Translator 5. Transaction Builder 7

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

CHAPTER 4: BUSINESS ANALYTICS

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

Optimizing Performance. Training Division New Delhi

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Data processing goes big

When to consider OLAP?

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

Is ETL Becoming Obsolete?

SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days

Implementing a Data Warehouse with Microsoft SQL Server 2014

CHAPTER 5: BUSINESS ANALYTICS

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

Data Integration and ETL with Oracle Warehouse Builder NEW

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

Jet Data Manager 2012 User Guide

POLAR IT SERVICES. Business Intelligence Project Methodology

Analance Data Integration Technical Whitepaper

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

SAS Business Intelligence Online Training

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

For Sales Kathy Hall

Data Integration Checklist

BIRT Document Transform

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

White Paper February IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

Data Integrator: Object Naming Conventions

Cognos Performance Troubleshooting

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Jitterbit Technical Overview : Microsoft Dynamics CRM

SQL Server 2008 Performance and Scale

Understanding the Value of In-Memory in the IT Landscape

Enterprise Data Integration for Microsoft Dynamics CRM

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

Zend and IBM: Bringing the power of PHP applications to the enterprise

edoc Document Generation Suite

Implementing a Data Warehouse with Microsoft SQL Server 2012

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Availability Digest. Raima s High-Availability Embedded Database December 2011

system. medical practices and healthcare vendors. Although they are designated as HIPAA EDI-specific standards, HIPAA

and BI Services Overview CONTACT W: E: M: +385 (91) A: Lastovska 23, Zagreb, Croatia

Oracle Data Integrator 11g: Integration and Administration

MDM and Data Warehousing Complement Each Other

INTRODUCING ORACLE APPLICATION EXPRESS. Keywords: database, Oracle, web application, forms, reports

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

In-memory Tables Technology overview and solutions

Transcription:

High-Volume Data Warehousing in Centerprise Product Datasheet

Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified Environment 4 Data Quality 4 Data Profiling 4 Translating Data to Star Schema 5 Maintaining Foreign Key Relationships 6 Dimension Table Maintenance 7 Performance: Key to Data Warehousing Success 8 Parallel-Processing 8 Caching and Querying Data 9 Development 11 Structured Development/Reusability 11 Source Control 11 Impact Analysis 12 Why Chose Centerprise? 12 Page 2

Overview Today s data warehouse is the foundation for a successfull business information program, incorporating data stores and conceptual, logical, and physical models that support business goals and end-user information needs. Creating a quality, trustworthy data warehouse requires mapping data between sources and targets, then capturing the details of the transformation in a metadata repository. As the amount of data generated by enterprises multiplies, performance is becoming a major factor. High performance data warehousing is all about achieving speed and scale while also effectively managing increasing complexity and concurrency to deliver quality data quickly and efficiently. Centerprise Data Integrator is uniquely positioned to deliver performance and scale to meet the most demanding of data warehousing projects and to inspire confidence in your business data and deliver on the promise of your data warehouse. Complexity Centerprise is specifically built to handle complex hierarchical data structures of any kind, so users can develop very large and complex data migration and synchronization applications, and, thanks to the intuitive, drag-anddrop interface, without writing a single line of code. Quality Data quality in the volume enterprise data warehouse is a real issue and is escalating as the amount and complexity of business data grows. Data quality errors are a key barrier to successful warehousing and analytics implementations. Centerprise is a single platform combining data integration and data quality that profiles, cleanses, and validates data to ensure readiness for your data warehouse. Because the Centerprise data quality module is built into the platform, users don t have to change applications the quality piece is all part of the same job. Speed and Scalability The Centerprise parallel processing engine delivers high performance and scalability and will never be the bottleneck in your integration process. As the need for power grows, businesses can simply add more servers and Centerprise s performance will scale accordingly. Page 3

Centerprise Data Warehouse Features ETL in a Unified Environment Centerprise Data Integrator brings together high-performance data warehousing extract, transform, and load (ETL) features in a single, intuitive platform that offers a number of features and optimizations to support data warehouse loading, including a high-performance slowly changing dimensions (SCD) component, lookup caching, robust parallel-processing engine, and optimized database writes. Data Quality A data warehouse is only as good as the quality of the data loaded into it. Centerprise offers several features that ensure the quality of enterprise data, including profiling, quality measurement, and validation. Data Profiling The extensive data profiling functionality in Centerprise allows analysis of completeness and accuracy of legacy data as well as validation of migrated data. It enables users to examine and collect statistics and other information about their data source, which can then be used to validate the the structure, content, and relationships of that data before embarking on a data migration process or as part of back-end logging to evaluate the migrated data and see what kind of error rates and failure rates were generated during the migration process. Centerprise provdes several important features for data profiling, namely the data quality rules component, the field profile, and the record-level log. Data Quality Measurement The Data Quality Rule Transformation can be used to generate a report showing how the data conforms to expecations. An entire battery of tests can be added in the transformation that the data must go through before it reaches its destination. For example, a data validation rule can be set requiring that the computed total must match the subtotal. Those records can then be stamped using a data quality check to find out whenever the computed total is not equaling the subtotal. Once records are stamped with that error they can be leveraged in a variety of ways, including using the field profile. Page 4

Validation Field Profile The Field Profile is a very powerful tool not only for business users to generate meaningful reports, but also for developers. It generates any kind of statistic a user could want. The information generated in the Field Profile of error records discussed above can be sent to an Excel file for a business analyst to mull over and bring up in the next performance meeting. It can also be sent to a Centerprise custom profile. Finally, this information can be used by setting variables, for example, the error count and total count can be set based on what comes from the Field Profile. Now, as this dataflow is completed, the information can be used in another workflow. Before beginning further processing, it is important to first make sure that the data is valid enough or meets a certain threshold. A decision transformation can be used to decide whether or not to continue processing. Record-Level Log Along with being useful for data quality validation on the front end, data quality checks can also be used on the back end of a process as a mechanism to log the information gathered from a data quality rule and send it to the Record-Level Log. The Record-Level Log in Centerprise can be attached to any action in the dataflow. Translating Data to Star Schema Also important in data warehousing is translating data to star schema. The star schema gets its name from its resemblance to a star, with a fact table at its center and the dimension tables surrounding it representing the star s points. The star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions, which are descriptive attributes related to fact data. Examples of fact data include sales price, sale quantity, and time, distance, speed, and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names. Page 5

Maintaining Foreign-Key Relationships Whether loading from files, other services, or another transactional database, it is important to be able to maintain foreign keys such as lookups and hierarchical relationships between tables as they move from an inherently different type of schema such as a relational model to a star model. There are three ways to write to multiple related tables. Retrieve key from a single record at a time The pros to this method is that users can rely on the database to generate a key, which guarantees referential integrity. The cons are that this method is very slow and requires a secondary identifier to join records back Use the Centerprise Dynamic Lookup and generate a new key if new record (recommended) This is the recommended method for writing to multiple related tables. The pros to this method are that it is fast and easy. The cons are that the user cannot rely on the database for key generation (Centerprise generates key) and all records must be written in one batch. Generate a key for a temporary relationship and then use temporary relationship in a subsequent step The pros to this method are that it is fast and users can rely on the database to generate the key. The cons are that for every single table the user must separate the temp file, which is complex and high maintenance, and usually requires temporary storage such as a staging table. Page 6

Dimension Table Maintenance Slowly Changing Dimension Transformation Maintaining dimension tables in a data warehouse is quite a chore. A great deal of time can be spent writing SQL scripts, stored procedures, or other code to perform this function. Often, the code is written or duplicated for each dimension table and must be modified regularly to accommodate changing business requirements. Typically, this custom code performs poorly for all but small tables. Centerprise features a dedicated write strategy for automatically loading and updating slowly changing dimension tables. The SCD Transformation uses Centerprise s data synchronization engine to efficiently handle large dimension tables. The SCD Transformation supports Type 1 and Type 2 SCDs and provides multiple row-versioning patterns, including effective and expiration dates, active/inactive value, and version number fields. Aggregate Transformation The Centerprise Aggregate Transformation helps create and update aggregate tables. It applies aggregate functions such as sum, count, minimum, maximum, average, and other aggregate functions to elements. Additionally, users can specify group-by elements and create output grouped by the specified fields. As with all Centerprise components, the Aggregate Transformation uses Centerprise s parallel processing technology to enable users to process high data volumes. Page 7

Performance: Key to Data Warehousing Success The number one cause of performance issues with Centerprise, as with any data integration program, is data volume having too many lookups, especially too many lookups all in a column. For example, suppose you have a fact table where all the keys need some sort of lookup and if you have, for instance, 10 lookups right before the dimension table, each one of those lookups has to complete before the record can be inserted into a fact table. Having a lot of inefficient lookups will slow down the dataflow considerably. A second issue that impairs performance is the number of lookups in an initial query. The way to solve this is to parameterize these queries, which can be done in several different ways. First, variables can be used that are controlled on the outside. For example, a workflow that triggers all the dataflows can be set for records for a limited time span, for instance a week. This will cut down significantly the amount of data going between the source database and Centerprise. Another option is very similar to using variables, but instead uses incremental load based on audit fields. If you have a field you know is guaranteed to get modified every time a change happens to it, you can use the modify date head in the audit field and it will store that information on a file. Then in subsequent times the dataflow is going to run, it will consult that file and basically do the same thing defined in the where clause, but will do it automatically in that where file. Parallel Processing The Centerprise multithreaded, parallel-processing engine ensures minimal blocking and starvation of threads, thus delivering a high degree of parallelism. When combined with today s multicore and multiprocessor hardware, this approach results in a data transformation engine that can scale to handle high data volumes. The Centerprise engine increases throughput in direct proportion to increases in processing power, which, for data warehouses with exploding data volumes, ensures continued scalability of the warehouse and the ETL processes. Page 8

Caching and Querying Data Loading fact tables efficiently is vital to a successful data warehouse project. With ever-increasing data volumes and shrinking transfer windows, it is imperative to load data quickly and correctly. Centerprise offers a number of caching and lookup technologies and features to deliver throughput to handle large data sets. Typically the argument for not using caching is memory consumption. Centerprise caching data is stored on disk so memory is not an issue and caching should be used whenever possible. For a small number of records, static caching is a good option; for very large source data tables, use Fill Cache With All Lookup Values at Start to avoid repeated trips to the database. With a large number of records and dimension table lookups, the need for a high-performance lookup transformation cannot be overemphasized. Centerprise specializes in unique lookup transformations that provide the technology necessary to handle high data volumes, including fast, intelligent caching and parallel execution. Dimension table versioning and late arriving facts are supported via effective/expiration dates, active/inactive row, and version number fields. In-Database Joins When joining data from the same database, the Database Join option within the Join Transformation can be used to build and run a single query joining multiple tables. This cuts down on the number of queries needed for the initial lookup, significantly enhancing performance of the dataflow. Page 9

Persistent Lookup Cache Often in data warehousing situations, the same dimension table is continually loaded. Making a trip to a very large table over and over again and retrieving all records is extremely expensive and can bring a data process to a halt. The Persistent Lookup Cache solves this problem and increases performance by scanning the lookup table one time and storing a snapshot of it on the server s local drive for use in subsequent runs. Change Data Capture A technology that improves throughput considerably for high volume data warehouses is change data capture (CDC), a set of approaches where incremental changes are applied to destination tables. Centerprise supports two distinct CDC patterns: incremental read from source using audit fields and incremental update at destination using the CDC hash function. These approaches substantially reduce transfer runtimes. Diff Processor Transformation The Centerprise Diff Processor Transformation can be used to compare an incoming data stream with existing data in the table and apply differences, which substantially speeds up dataflows by ensuring you are only sending the amount of data to the processor that is absolutely necessary. Page 10

The Diff Processor Transformation has two parts: the Source Diff Processor and the Diff Processor. The Source Diff Processor database write strategy logs every time a dataflow is run through Centerprise so that subsequent runs are compared against that and only records that must be updated are sent to destination. The Diff processor does the same thing as the Source Diff Processor but compares against a table instead of source file. The Diff Processor is much faster than upsert. Upsert sends a query to see if the information exists or not, while the Diff Processor works by sending all the records in bunches to the target system. They are then written to a temporary table and joined. That comparison happens on the database side rather than on Centerprise side, so large chunks are prepared on the database side rather than using a separate query to find out whether an insert or update needs to happen. Basically, Upsert does it one record at a time and Diff Processor compares in batches, so it is orders of magnitude faster. Development Typically large projects like data warehouses in Centerprise are developed by teams. When collaborating with others in Centerprise teams can take advantage of several features. Structured Development/Reusability Centerprise provides project folders to organize large and complex projects such as those in data warehousing. When creating a workflow involving multiple data flow, the dataflows can be dragged and dropped and connected, creating reusable building blocks that only have to be defined once and then can be used repeatedly in the future. Source Control For any sort of data warehouse or complex project where teams are working on the same files, Source Control is a must. Accidents happen, things get overwritten, etc. Centerprise provides built-in connectivity to Microsoft Team Foundation and for those who do not have this, all files are written in XML format and have the ability to be checked in and out. Page 11

Impact Analysis The Centerprise database browser offers impact analysis, which shows the lineage of a table and where it is used in any dataflow throughout Centerprise. This is very useful for large, complicated projects such as would be found in a data warehousing. With this feature, if you need to make a change in a table and need to see which dataflows are affected by this. Connectivity Centerprise s ever-expanding library of Centerprise Connectors offers a plethora of integration options that support popular databases and file formats including Oracle, SQL Server, DB2, Sybase, and MYSQL, as well as popular file formats such as delimited, fixed length, Excel, COBOL, XML, and others. Centerprise Connector pre-configured workflows and dataflows enable your Centerprise integration engine to communicate quickly and easily with specific enterprise business applications, like Salesforce, Microsoft Dynamics, Quickbooks and more, as well as with industry leading databases, data warehouses, and technologies such as EDI and web services. Why Chose Centerprise? Astera s Centerprise solutions enable organizations to complete their integration and migration projects more quickly and efficiently with features and technologies created especially for high-volume data warehouse project and business needs. Astera s flexible and extendable Centerprise platform is fast becoming the platform of choice for medium and large enterprises and government agencies. An innovative parallel processing architecture and smart optimizations enable Centerprise to meet the needs of large enterprise scale data integration projects. Astera not only offers industry-leading software, but has the expertise, deep product knowledge and data management project experience on which your team can capitalize to deliver your project more quickly and efficiently. www.astera.com Contact us for more information or to request a free trial sales@astera.com 888-77-ASTERA 2014 Astera Software Page 12