dbspeak DBs peak when we speak

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "dbspeak DBs peak when we speak"

Transcription

1 Data Profiling: A Practitioner s approach using Dataflux [Data profiling] employs analytic methods for looking at data for the purpose of developing a thorough understanding of the content, structure, and quality of the data. A good data profiling [system] can process very large amounts of data, and with the skills of the analyst, uncover all sorts of issues that need to be addressed. - Data Quality: The Accuracy Dimension, Jack E. Olson Author: Satya P Vedula dbspeak DBs peak when we speak

2 What is Data Profiling? Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. Wikipedia What-if Data Profiling is not done? The Data Warehousing Institute (TDWI) estimates that poor quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead. 1 Oh, I hate it when it does that. Just enter for the address and all nines for the phone and keep going to the next screen. A sales associate Poor data quality leads to high tech failures. Here are few as quoted by TDWI, A large bank discovered that 62 percent of its home equity loans were being calculated incorrectly, with the principal getting larger each month A telecommunications firm lost $8 million a month because data entry errors incorrectly coded accounts, preventing bills from being sent out. Why Data Profiling? Data profiling is the first step to ensure a successful data intensive project. It s the most practical way to learn about database quality and its contents and to effectively track data quality to achieve the goal of reliable data by making it a single truth of data to assess the risk involved in integrating data for new applications to assess if metadata accurately describes the actual values in the source database to understand challenges early in data intensive project, for enterprise view of all data such as, Master Data Management, or Data governance to determine data cleansing and build related routines How Data Profiling is done? Structured discovery: Perform a structure analysis of all tables by examining data complete columns. Various techniques that could be used include metadata validation, pattern matching, and basic statistics. Content discovery: Once structured analysis is complete apply matching technologies like standardization, frequency counts & outliers detection, business rules validation, Employ statistical techniques Relationship discovery: As today s data warehouses contain massive amounts of data covering various subject matters; understanding the data relationships (for example customer in claims against customer in policy) and relating them results in better business decisions 1 TDWI estimate based on cost-savings cited by survey respondents and others who have cleaned up name and address data, combined with Dunn & Bradstreet counts of U.S. businesses by number of employees

3 Data profiling in Practice: Using a Tool Gartner Report, June 2007 shows there are only two leading 3 rd party players in market for data profiling, DataFlux and Trillium Software. This paper shows how to leverage DataFlux as a tool for Data Profiling activities. Data profiling in practice has three distinctive phases, Initial Profiling: Perform the initial data profiling and do the data assessment Integration & Automate: Integrate the profiling and automate the profiling process to pro-actively monitor changing data Report the results: Pass on the profiling results to the Business users, data architects, and developers to act on For better quality control, data profiling needs to perform following audits 2, Audit Type Example Domain checking In a gender field, the value should be M or F. Range checking For age, the value should be less than 125 and greater than 0. Cross-field verification If a customer orders an upgrade, then make sure that the customer already owns the product. Address format If Street is the designation for street, then make sure no other verification designations are used. Name standardization If Robert is the standard name for Robert, then make sure that Bob, Robt and Rob are not used. Reference field If GM stands for General Motors, then make sure it does not stand consolidation for General Mills elsewhere. Format consolidation Make sure that date information is stored as yyyymmdd in each applicable field. Referential integrity If an order shows that a customer bought product XYZ, then make sure that there actually is a product XYZ. Basic statistics, If an organization has products that cost between 1000 and frequencies, ranges, and dollars, you can run a report for product prices that are not in this outliers range. You can also view product information, such as SKU codes, to find out if the SKU groupings are correct and in line with the expected frequencies. Duplicate identification If an inactive flag is used to identify customers that are no longer covered by health benefits, then make sure all duplicate records are also marked inactive. Uniqueness and missing If UPC or SKU codes are supposed to be unique, and then make sure value validation they are not being reused. Key identification If there is a defined primary key/foreign key relationship across tables, then validate it by looking for records that do not have a parent. Data rule compliance If closed credit accounts must have a balance of 0, then make sure there are no records where the closed account flag is true and the account balance total is greater than 0. 2 Source from DataFlux

4 Best Practices in using Dataflux for Profiling & Testing Dataflux has many capabilities, including data testing, automation using Data Integration Server in addition to the Data Profiling. Best practices listed here in using Dataflux are not just limited to Data Profiling, but to the entire data testing capabilities of DataFlux. Preparation Gather source and target system documentation Compiling logons and passwords into existing metadata repository Additional tools, setup & connectivity Create necessary DSN using appropriate odbc drivers For DB2 use DB2 Wired Protocol instead of native drivers Prepare macros for source, target DSNS using connection strings into architect.cfg Use macros for input and output files. Identify and categorize sources and targets by the way a tool connects, Relational, file connection etc. As required identify and save the file layouts Use a versioning system to store architect job, source/target file layouts, copybooks, xml layouts, and sample data (in files) The connection names on Windows and UNIX must correspond for Data Integration Server and Remote Job to work properly. Team Training, Roles & Responsibilities Prepare reports/deliverables from the profiled data Familiarity with other tools like Microsoft Excel to dynamically parse and reanalyze data Identify the lead to participate in issues, scope, and progress Identify data analyst to look for data anomalies Identify business analyst to look for business rules violation Decide the approach Start profiling by Subject area or Physical structure Identify how architect jobs will be automated on Integration Server Extract, Load & Transform Create extract programs/sql to fetch data from source and target systems Code Page compatibility: When source and Target databases use different code pages certain characters (CHAR, VARCHAR, and TEXT data types) are translated differently. Check with ETL tool used (eg: Informatica PowerCenter) for code page compatibility issues. (Known problem exists converting certain characters from Sybase, Mainframe DB2 and certain flat files having accent characters to Teradata) Check for Unicode support, where needed. Dataflux doesn t support VSAM tables on Mainframe with shipped ODBC drivers. Check if Data Direct Shadow drivers are needed.

5 Dataflux doesn t support COMP-3 clauses (packed decimal) in Cobol copybooks well Dataflux has open issues handling xml files larger than 2 GB. Make sure these limits are not reached. Push the key creation logic, and other formatting to the database being queried (into sql) DB2 allows a time of 24:00:00; however this is not valid in most of the other Relational Databases including Teradata and Oracle. Make necessary format changes to handle this time. ETL tool, Informatica ignores/truncates milliseconds during transformation process. This might result in invalid results while checking Identify fields that need to be split/separated into various parts for analysis Sampling the data Dataflux can determine the layout of an input flat file. A good sample gives better file layout. A sampling of rows might be sufficient for this purpose. Large free form text fields may not be benefited by data profiling. However, first few characters (20 50 characters) contain cryptic codes. Make sure, they are analyzed separately. Split or merge fields containing telephone numbers, addresses, and names prior to load (i.e. push to sql) When a table has a combination of columns as its key, its desirable to combine them into a single column prior to load (as part of sql) Once the architect job is executed, plan for its input and output filesto be moved into a different location, to free up server for space. Analysis While data is better understood by Subject Matter Experts (SMEs), following statistical approach may benefit a better understanding by data analysts Use the profiled data for sanity check of data, by comparing with prior measurements / metrics collected. Check if Records / Counts / Null Count match Check if the high/low values are appropriate in Min and Max fields. For a table level matching between Source and Target databases, a Sum on money columns should be matched Identify the most used values in min and max values. These are good candidates for enabling compression Use USPS and Geocode database for validating addresses, phone numbers etc Use chop tables for logically creating sub-string elements from input data Use Grammar is a set of rules that represent expected patterns of words in a given context. Use phonetic library for analysis (Phonetics) during the process of generating match codes. Eg: matching SCHMIDT and SCHMITT Use Reg-Ex libraries for normalization, standardization, and other input string pre-processing activities.

Jet Data Manager 2012 User Guide

Jet Data Manager 2012 User Guide Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

More information

Data Quality Assessment. Approach

Data Quality Assessment. Approach Approach Prepared By: Sanjay Seth Data Quality Assessment Approach-Review.doc Page 1 of 15 Introduction Data quality is crucial to the success of Business Intelligence initiatives. Unless data in source

More information

SAP BusinessObjects Information Steward

SAP BusinessObjects Information Steward SAP BusinessObjects Information Steward Michael Briles Senior Solution Manager Enterprise Information Management SAP Labs LLC June, 2011 Agenda Challenges with Data Quality and Collaboration Product Vision

More information

Data Profiling and Mapping The Essential First Step in Data Migration and Integration Projects

Data Profiling and Mapping The Essential First Step in Data Migration and Integration Projects Data Profiling and Mapping The Essential First Step in Data Migration and Integration Projects An Evoke Software White Paper Summary At any given time, according to industry analyst estimates, roughly

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence Introduction to Oracle Business Intelligence Standard Edition One Mike Donohue Senior Manager, Product Management Oracle Business Intelligence The following is intended to outline our general product direction.

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

Relational Databases for the Business Analyst

Relational Databases for the Business Analyst Relational Databases for the Business Analyst Mark Kurtz Sr. Systems Consulting Quest Software, Inc. mark.kurtz@quest.com 2010 Quest Software, Inc. ALL RIGHTS RESERVED Agenda The RDBMS and its role in

More information

IMS Application Retirement Think the Unthinkable

IMS Application Retirement Think the Unthinkable IMS Application Retirement Think the Unthinkable 1 December 2015 John B Boyle Senior Product Specialist Informatica Software Abstract Although IMS is (hopefully!) still central to the day-to-day running

More information

Enterprise Data Quality

Enterprise Data Quality Enterprise Data Quality An Approach to Improve the Trust Factor of Operational Data Sivaprakasam S.R. Given the poor quality of data, Communication Service Providers (CSPs) face challenges of order fallout,

More information

High-Volume Data Warehousing in Centerprise. Product Datasheet

High-Volume Data Warehousing in Centerprise. Product Datasheet High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified

More information

AMB-PDM Overview v6.0.5

AMB-PDM Overview v6.0.5 Predictive Data Management (PDM) makes profiling and data testing more simple, powerful, and cost effective than ever before. Version 6.0.5 adds new SOA and in-stream capabilities while delivering a powerful

More information

Data Warehouse Center Administration Guide

Data Warehouse Center Administration Guide IBM DB2 Universal Database Data Warehouse Center Administration Guide Version 8 SC27-1123-00 IBM DB2 Universal Database Data Warehouse Center Administration Guide Version 8 SC27-1123-00 Before using this

More information

BENEFITS OF AUTOMATING DATA WAREHOUSING

BENEFITS OF AUTOMATING DATA WAREHOUSING BENEFITS OF AUTOMATING DATA WAREHOUSING Introduction...2 The Process...2 The Problem...2 The Solution...2 Benefits...2 Background...3 Automating the Data Warehouse with UC4 Workload Automation Suite...3

More information

ETL Tools. L. Libkin 1 Data Integration and Exchange

ETL Tools. L. Libkin 1 Data Integration and Exchange ETL Tools ETL = Extract Transform Load Typically: data integration software for building data warehouse Pull large volumes of data from different sources, in different formats, restructure them and load

More information

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access

More information

Oracle Essbase Integration Services. Readme. Release 9.3.3.0.00

Oracle Essbase Integration Services. Readme. Release 9.3.3.0.00 Oracle Essbase Integration Services Release 9.3.3.0.00 Readme To view the most recent version of this Readme, see the 9.3.x documentation library on Oracle Technology Network (OTN) at http://www.oracle.com/technology/documentation/epm.html.

More information

Running Analytics on SAP HANA and BW with MicroStrategy

Running Analytics on SAP HANA and BW with MicroStrategy Running Analytics on SAP HANA and BW with MicroStrategy Presented by: Trishla Maru Agenda Overview Relationship and Certification with SAP Integration to SAP BW Overview with SAP BW Import process and

More information

A Design Technique: Data Integration Modeling

A Design Technique: Data Integration Modeling C H A P T E R 3 A Design Technique: Integration ing This chapter focuses on a new design technique for the analysis and design of data integration processes. This technique uses a graphical process modeling

More information

Data Migration in SAP environments

Data Migration in SAP environments Framework for Data Migration in SAP environments Does this scenario seem familiar? Want to save 50% in migration costs? Data migration is about far more than just moving data into a new application or

More information

Evaluation Checklist Data Warehouse Automation

Evaluation Checklist Data Warehouse Automation Evaluation Checklist Data Warehouse Automation March 2016 General Principles Requirement Question Ajilius Response Primary Deliverable Is the primary deliverable of the project a data warehouse, or is

More information

Data Warehouse Implementation Checklist

Data Warehouse Implementation Checklist Data Warehouse Implementation Checklist 15 November 2010 Prepared by: Knowledge Base Sdn Bhd THIS DOCUMENT AND INFORMATION HEREIN ARE THE PROPERTY OF KNOWLEDGE BASE SDN BHD Copyright 2010. Knowledge Base

More information

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software Donna Torrence, SAS Institute Inc., Cary, North Carolina Juli Staub Perry, SAS Institute Inc., Cary, North Carolina

More information

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise Business Intelligence is the #1 Priority the most important technology in 2007 is business intelligence

More information

CA Repository for z/os r7.2

CA Repository for z/os r7.2 PRODUCT SHEET CA Repository for z/os CA Repository for z/os r7.2 CA Repository for z/os is a powerful metadata management tool that helps organizations to identify, understand, manage and leverage enterprise-wide

More information

The Data Warehouse ETL Toolkit

The Data Warehouse ETL Toolkit 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,

More information

Rohita Yamaganti, Usha Manjari Sikharam IJSER

Rohita Yamaganti, Usha Manjari Sikharam IJSER International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 310 Data Warehousing Concepts Using ETL Process for Social Media Data Extraction Rohita Yamaganti, Usha Manjari

More information

The release notes provide details of enhancements and features in Cloudera ODBC Driver for Impala 2.5.30, as well as the version history.

The release notes provide details of enhancements and features in Cloudera ODBC Driver for Impala 2.5.30, as well as the version history. Cloudera ODBC Driver for Impala 2.5.30 The release notes provide details of enhancements and features in Cloudera ODBC Driver for Impala 2.5.30, as well as the version history. The following are highlights

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

Job Description. Direct Reports

Job Description. Direct Reports Job Description Job Title Lead DBA Function IT Services IT Applications Reporting to IT Applications Manager Direct Reports DBA team, currently comprising of two other DBA s Working Hours Standard 35 hours

More information

Managing Third Party Databases and Building Your Data Warehouse

Managing Third Party Databases and Building Your Data Warehouse Managing Third Party Databases and Building Your Data Warehouse By Gary Smith Software Consultant Embarcadero Technologies Tech Note INTRODUCTION It s a recurring theme. Companies are continually faced

More information

The Evolution of ETL

The Evolution of ETL The Evolution of ETL -From Hand-coded ETL to Tool-based ETL By Madhu Zode Data Warehousing & Business Intelligence Practice Page 1 of 13 ABSTRACT To build a data warehouse various tools are used like modeling

More information

Data Integrity and Integration: How it can compliment your WebFOCUS project. Vincent Deeney Solutions Architect

Data Integrity and Integration: How it can compliment your WebFOCUS project. Vincent Deeney Solutions Architect Data Integrity and Integration: How it can compliment your WebFOCUS project Vincent Deeney Solutions Architect 1 After Lunch Brain Teaser This is a Data Quality Problem! 2 Problem defining a Member How

More information

<Insert Picture Here> Move to Oracle Database with Oracle SQL Developer Migrations

<Insert Picture Here> Move to Oracle Database with Oracle SQL Developer Migrations Move to Oracle Database with Oracle SQL Developer Migrations The following is intended to outline our general product direction. It is intended for information purposes only, and

More information

DATA MANAGEMENT USER GROUP

DATA MANAGEMENT USER GROUP DATA MANAGEMENT USER GROUP MANCHESTER 9 TH MARCH 2016 AGENDA SAS DATA MANAGEMENT USER GROUP 9 th March 2016, SAS Manchester Time Topic Speaker 9-9.30am Coffee and networking 9.30-9.35am Introductions All

More information

Job Description. Working Hours Standard 35 hours per week Normally working Mon Fri 9am to 5pm with additional hours as required

Job Description. Working Hours Standard 35 hours per week Normally working Mon Fri 9am to 5pm with additional hours as required Job Description Job Title Oracle Support Technical Developer Function IT Services Applications Reporting to Applications Manager Direct Reports None Working Hours Standard 35 hours per week Normally working

More information

IST722 Data Warehousing

IST722 Data Warehousing IST722 Data Warehousing Introducing ETL Michael A. Fudge, Jr. Recall: Kimball Lifecycle Objective: Define and explain the ETL components and subsystems What is ETL? ETL: 4 Major Operations 1. Extract the

More information

Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing. 1 P a g e. www.analytixds.com

Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing. 1 P a g e. www.analytixds.com Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing 1 P a g e Table of Contents What is the key to agility in Data Warehousing?... 3 The need to address requirements completely....

More information

Guide to the MySQL Workbench Migration Wizard: From Microsoft SQL Server to MySQL

Guide to the MySQL Workbench Migration Wizard: From Microsoft SQL Server to MySQL Guide to the MySQL Workbench Migration Wizard: From Microsoft SQL Server to MySQL A Technical White Paper Table of Contents Introduction...3 MySQL & LAMP...3 MySQL Reduces Database TCO by over 90%... 4

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Advanced SQL. Jim Mason. www.ebt-now.com Web solutions for iseries engineer, build, deploy, support, train 508-728-4353. jemason@ebt-now.

Advanced SQL. Jim Mason. www.ebt-now.com Web solutions for iseries engineer, build, deploy, support, train 508-728-4353. jemason@ebt-now. Advanced SQL Jim Mason jemason@ebt-now.com www.ebt-now.com Web solutions for iseries engineer, build, deploy, support, train 508-728-4353 What We ll Cover SQL and Database environments Managing Database

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

SAS Clinical Training

SAS Clinical Training Course Outline for SAS Clinical Training SAS Clinical Training SAS Clinical Introduction History of SAS SAS comes in ERP sector or not? Why? Role of Statistical Analysis in Clinical Research Study and

More information

Chapter 5. Learning Objectives. DW Development and ETL

Chapter 5. Learning Objectives. DW Development and ETL Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)

More information

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Extraction Transformation Loading ETL Get data out of sources and load into the DW Lection 5 ETL Definition Extraction Transformation Loading ETL Get data out of sources and load into the DW Data is extracted from OLTP database, transformed to match the DW schema and loaded into the

More information

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project

More information

Informatica ILM Archive and Application Retirement

Informatica ILM Archive and Application Retirement Informatica ILM Archive and Application Retirement Thierry AUDOT Technical Manager EMEA 26 th September 2012 1 Live Archiving What are key users pain points? My reports take forever to run! I need all

More information

DiskPulse DISK CHANGE MONITOR

DiskPulse DISK CHANGE MONITOR DiskPulse DISK CHANGE MONITOR User Manual Version 7.9 Oct 2015 www.diskpulse.com info@flexense.com 1 1 DiskPulse Overview...3 2 DiskPulse Product Versions...5 3 Using Desktop Product Version...6 3.1 Product

More information

QlikView 11.2 SR5 DIRECT DISCOVERY

QlikView 11.2 SR5 DIRECT DISCOVERY QlikView 11.2 SR5 DIRECT DISCOVERY FAQ and What s New Published: November, 2012 Version: 5.0 Last Updated: December, 2013 www.qlikview.com 1 What s New in Direct Discovery 11.2 SR5? Direct discovery in

More information

An Architectural Review Of Integrating MicroStrategy With SAP BW

An Architectural Review Of Integrating MicroStrategy With SAP BW An Architectural Review Of Integrating MicroStrategy With SAP BW Manish Jindal MicroStrategy Principal HCL Objectives To understand how MicroStrategy integrates with SAP BW Discuss various Design Options

More information

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server Extending Hyperion BI with the Oracle BI Server Mark Ostroff Sr. BI Solutions Consultant Agenda Hyperion BI versus Hyperion BI with OBI Server Benefits of using Hyperion BI with the

More information

Data Warehouse with Data Integration: Problems and Solution

Data Warehouse with Data Integration: Problems and Solution Data Warehouse with Data Integration: Problems and Solution Prof. Sunila Shivtare 1, Prof. Pranjali Shelar 2 1 (Computer Science, Savitribai Phule University of Pune, India) 2 (Computer Science, Savitribai

More information

Integrating Data and Business Rules with a Control Data Set in SAS

Integrating Data and Business Rules with a Control Data Set in SAS Paper 3461-2015 Integrating Data and Business Rules with a Data Set in SAS Edmond Cheng, CACI International Inc. ABSTRACT In SAS software development, data specifications and process requirements can be

More information

POLAR IT SERVICES. Business Intelligence Project Methodology

POLAR IT SERVICES. Business Intelligence Project Methodology POLAR IT SERVICES Business Intelligence Project Methodology Table of Contents 1. Overview... 2 2. Visualize... 3 3. Planning and Architecture... 4 3.1 Define Requirements... 4 3.1.1 Define Attributes...

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Subject: Request for Information (RFI) Franchise Tax Board (FTB) Security Information and Event Management (SIEM) Project.

Subject: Request for Information (RFI) Franchise Tax Board (FTB) Security Information and Event Management (SIEM) Project. chair John Chiang member Jerome E. Horton member Ana J. Matosantos August 27, 2012 To: Potential Vendors Subject: Request for Information (RFI) Franchise Tax Board (FTB) Security Information and Event

More information

Business Intelligence Tutorial

Business Intelligence Tutorial IBM DB2 Universal Database Business Intelligence Tutorial Version 7 IBM DB2 Universal Database Business Intelligence Tutorial Version 7 Before using this information and the product it supports, be sure

More information

Establish and maintain Center of Excellence (CoE) around Data Architecture

Establish and maintain Center of Excellence (CoE) around Data Architecture Senior BI Data Architect - Bensenville, IL The Company s Information Management Team is comprised of highly technical resources with diverse backgrounds in data warehouse development & support, business

More information

SAP Data Services 4.X. An Enterprise Information management Solution

SAP Data Services 4.X. An Enterprise Information management Solution SAP Data Services 4.X An Enterprise Information management Solution Table of Contents I. SAP Data Services 4.X... 3 Highlights Training Objectives Audience Pre Requisites Keys to Success Certification

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,

More information

Data Domain Discovery in Test Data Management

Data Domain Discovery in Test Data Management Data Domain Discovery in Test Data Management 1993-2016 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

FileMaker 12. ODBC and JDBC Guide

FileMaker 12. ODBC and JDBC Guide FileMaker 12 ODBC and JDBC Guide 2004 2012 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.

More information

Data Domain Profiling and Data Masking for Hadoop

Data Domain Profiling and Data Masking for Hadoop Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or

More information

Enterprise Data Governance

Enterprise Data Governance Enterprise Aligning Quality With Your Program Presented by: Mark Allen Sr. Consultant, Enterprise WellPoint, Inc. (mark.allen@wellpoint.com) 1 Introduction: Mark Allen is a senior consultant and enterprise

More information

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition 1 What s New with Informatica Data Services & PowerCenter Data Virtualization Edition Kevin Brady, Integration Team Lead Bonneville Power Wei Zheng, Product Management Informatica Ash Parikh, Product Marketing

More information

ODBC Client Driver Help. 2015 Kepware, Inc.

ODBC Client Driver Help. 2015 Kepware, Inc. 2015 Kepware, Inc. 2 Table of Contents Table of Contents 2 4 Overview 4 External Dependencies 4 Driver Setup 5 Data Source Settings 5 Data Source Setup 6 Data Source Access Methods 13 Fixed Table 14 Table

More information

BUSINESSOBJECTS DATA INTEGRATOR

BUSINESSOBJECTS DATA INTEGRATOR PRODUCTS BUSINESSOBJECTS DATA INTEGRATOR IT Benefits Correlate and integrate data from any source Efficiently design a bulletproof data integration process Improve data quality Move data in real time and

More information

ETL Implementation for Extreme Performance. Presented By: Mrs. Catherine Boeving Mr. Greg Wade

ETL Implementation for Extreme Performance. Presented By: Mrs. Catherine Boeving Mr. Greg Wade 1 ETL Implementation for Extreme Performance Presented By: Mrs. Catherine Boeving Mr. Greg Wade 2 Topics About Us Tips and tricks for high performance mapping design Pipeline techniques to improve throughput

More information

SQL Server An Overview

SQL Server An Overview SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Data Quality Considerations for Long Term Data Retention

Data Quality Considerations for Long Term Data Retention Data Quality Considerations for Long Term Data Retention ABSTRACT IT shops are faced with new requirements to retain specific data for long periods of time; often decades. This is overloading operational

More information

Technology Foundations. Conan C. Albrecht, Ph.D.

Technology Foundations. Conan C. Albrecht, Ph.D. Technology Foundations Conan C. Albrecht, Ph.D. Overview 9. Human Analysis Reports 8. Create Reports 6. Import Data 7. Primary Analysis Data Warehouse 5. Transfer Data as CSV, TSV, or XML 1. Extract Data

More information

Migrating Non-Oracle Databases and their Applications to Oracle Database 12c O R A C L E W H I T E P A P E R D E C E M B E R 2 0 1 4

Migrating Non-Oracle Databases and their Applications to Oracle Database 12c O R A C L E W H I T E P A P E R D E C E M B E R 2 0 1 4 Migrating Non-Oracle Databases and their Applications to Oracle Database 12c O R A C L E W H I T E P A P E R D E C E M B E R 2 0 1 4 1. Introduction Oracle provides products that reduce the time, risk,

More information

Business Intelligence Tutorial: Introduction to the Data Warehouse Center

Business Intelligence Tutorial: Introduction to the Data Warehouse Center IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse Center Version 8 IBM DB2 Universal Database Business Intelligence Tutorial: Introduction to the Data Warehouse

More information

Understanding Code Pages and Character Conversion

Understanding Code Pages and Character Conversion Understanding Code Pages and Character Conversion 2008-2009 Informatica Corporation Abstract Code page character conversion can occur when data passes between databases, database clients, and PowerCenter

More information

BUSINESSOBJECTS DATA INTEGRATOR

BUSINESSOBJECTS DATA INTEGRATOR PRODUCTS BUSINESSOBJECTS DATA INTEGRATOR IT Benefits Correlate and integrate data from any source Efficiently design a bulletproof data integration process Accelerate time to market Move data in real time

More information

A WHITE PAPER By Silwood Technology Limited

A WHITE PAPER By Silwood Technology Limited A WHITE PAPER By Silwood Technology Limited Using Safyr to facilitate metadata transparency and communication in major Enterprise Applications Executive Summary Enterprise systems packages such as SAP,

More information

FileMaker 13. ODBC and JDBC Guide

FileMaker 13. ODBC and JDBC Guide FileMaker 13 ODBC and JDBC Guide 2004 2013 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.

More information

CA Repository for Distributed. Systems r2.3. Benefits. Overview. The CA Advantage

CA Repository for Distributed. Systems r2.3. Benefits. Overview. The CA Advantage PRODUCT BRIEF: CA REPOSITORY FOR DISTRIBUTED SYSTEMS r2.3 CA Repository for Distributed Systems r2.3 CA REPOSITORY FOR DISTRIBUTED SYSTEMS IS A POWERFUL METADATA MANAGEMENT TOOL THAT HELPS ORGANIZATIONS

More information

Instant Data Warehousing with SAP data

Instant Data Warehousing with SAP data Instant Data Warehousing with SAP data» Extracting your SAP data to any destination environment» Fast, simple, user-friendly» 8 different SAP interface technologies» Graphical user interface no previous

More information

SAS Online Course - Smart Mind Online Training, Hyderabad. SAS Online Training Course Content

SAS Online Course - Smart Mind Online Training, Hyderabad. SAS Online Training Course Content Faculty: Real time and certified SAS Online Training Course Content (Includes theoretical as well as practical sessions) BASICS BEFORE STARTING SAS: DATAWAREHOSING Concepts What is ETL ETL Concepts What

More information

Reporting MDM Data Attribute Inconsistencies for the Enterprise Using DataFlux

Reporting MDM Data Attribute Inconsistencies for the Enterprise Using DataFlux Reporting MDM Data Attribute Inconsistencies for the Enterprise Using DataFlux Ernesto Roco, Hyundai Capital America (HCA), Irvine, CA ABSTRACT The purpose of this paper is to demonstrate how we use DataFlux

More information

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Page 1 of 8 TU1UT TUENTERPRISE TU2UT TUREFERENCESUT TABLE

More information

FileMaker 11. ODBC and JDBC Guide

FileMaker 11. ODBC and JDBC Guide FileMaker 11 ODBC and JDBC Guide 2004 2010 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker is a trademark of FileMaker, Inc. registered

More information

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy What is Data Virtualization? by Rick F. van der Lans, R20/Consultancy August 2011 Introduction Data virtualization is receiving more and more attention in the IT industry, especially from those interested

More information

Business Intelligence Getting Started Guide

Business Intelligence Getting Started Guide Business Intelligence Getting Started Guide 2013 Table of Contents Introduction... 1 Introduction... 1 What is Sage Business Intelligence?... 1 System Requirements... 2 Recommended System Requirements...

More information

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA? WHAT IS BIG DATA? BIG DATA DR. KLARA NELSON THE UNIVERSITY OF TAMPA "Volumes of data that are unusually large, or types of data that are unstructured" Thomas Davenport, Keeping Up with the Quants, 2013,

More information

Enterprise Data Integration The Foundation for Business Insight

Enterprise Data Integration The Foundation for Business Insight Enterprise Data Integration The Foundation for Business Insight Data Hubs Data Migration Data Warehousing Data Synchronization Business Activity Monitoring Ingredients for Success Enterprise Visibility

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

CSPP 53017: Data Warehousing Winter 2013" Lecture 6" Svetlozar Nestorov" " Class News

CSPP 53017: Data Warehousing Winter 2013 Lecture 6 Svetlozar Nestorov  Class News CSPP 53017: Data Warehousing Winter 2013 Lecture 6 Svetlozar Nestorov Class News Homework 4 is online Due by Tuesday, Feb 26. Second 15 minute in-class quiz today at 6:30pm Open book/notes Last 15 minute

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Accurate identification and maintenance. unique customer profiles are critical to the success of Oracle CRM implementations.

Accurate identification and maintenance. unique customer profiles are critical to the success of Oracle CRM implementations. Maintaining Unique Customer Profile for Oracle CRM Implementations By Anand Kanakagiri Editor s Note: It is a fairly common business practice for organizations to have customer data in several systems.

More information

Getting Started Guide SAGE ACCPAC INTELLIGENCE

Getting Started Guide SAGE ACCPAC INTELLIGENCE Getting Started Guide SAGE ACCPAC INTELLIGENCE Table of Contents Introduction... 1 What is Sage Accpac Intelligence?... 1 What are the benefits of using Sage Accpac Intelligence?... 1 System Requirements...

More information

Performance Tuning Guidelines for Relational Database Mappings

Performance Tuning Guidelines for Relational Database Mappings Performance Tuning Guidelines for Relational Database Mappings 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

OBIEE DEVELOPER RESUME

OBIEE DEVELOPER RESUME 1 of 5 05/01/2015 13:14 OBIEE DEVELOPER RESUME Java Developers/Architects Resumes Please note that this is a not a Job Board - We are an I.T Staffing Company and we provide candidates on a Contract basis.

More information

Metadata Application Understanding Software Migration

Metadata Application Understanding Software Migration Metadata Application Understanding Software Migration Jens-Uwe Richter Mgr. of Development Agenda The Rochade Metadata Landscape Governance, Compliancy, Regulation The Art to Master it About Sharing Information

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

Oracle Database 11g Comparison Chart

Oracle Database 11g Comparison Chart Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix

More information

Creating a universe on Hive with Hortonworks HDP 2.0

Creating a universe on Hive with Hortonworks HDP 2.0 Creating a universe on Hive with Hortonworks HDP 2.0 Learn how to create an SAP BusinessObjects Universe on top of Apache Hive 2 using the Hortonworks HDP 2.0 distribution Author(s): Company: Ajay Singh

More information