Big Data and Analytics at the IRS:

Size: px
Start display at page:

Download "Big Data and Analytics at the IRS:"

Transcription

1 Big Data and Analytics at the IRS: Perspectives and Initatives Government Big Data Symposium March 5-6, 2013 Jeff Butler Director, Research Databases IRS, Research, Analysis, and Statistics

2 Background The Internal Revenue Service (IRS) has a large service and enforcement footprint. The table below is from FY Tax Return Processing Account Management Customer Service Enforcement 234 million tax returns filed 1.8 billion third-party information returns $2.4 trillion in gross receipts 122 million refunds totaling $415 billion 319 million vists to IRS website 83 million toll-free telephone calls 223 million letters or notices sent to taxpayers $116 billion in accounts receivable 2

3 Types of Research and Analysis Taxpayer Behavior Failure to file or pay Abusive tax shelters Identity theft Return preparer compliance Misreporting income or deductions Refund fraud Off-shore transactions Financial crimes Analytic Initiatives Identify patterns of filing and payment non-compliance Predict and prevent ID theft and refund dfraud Estimate U.S. tax gap Measure taxpayer py burden Optimize case inventories and treatment strategies Simulate effects of tax changes Analyze criminal networks 3

4 Analytic Data Environment in IRS IRS enterprise IT manages hundreds of transactional systems and applications Research organization integrates legacy and third-party data into the Compliance Data Warehouse (CDW) Compliance Data Warehouse (CDW) Selected Metrics Total data size ~ 1.3PB Number of database tables ~ 3,100 Number of unique columns ~ 52,500 Number of searchable metadata attributes > 1 million Number of users ~ 1,020 Average daily queries ~ 6,500 4

5 IRS Analytic Data Environment Compliance Data Warehouse (CDW) Analytic Sandboxes (Examples) Case Predictive Text Optimization i Modeling Analytics Simulation Data Integration Layer Core Analytic Database nterprise Data a E Integration La ayer Data Statistical & Mathematical Analysis Storage Mgmt Security/Audit Monitoring Ad-Hoc Query and Reporting Infrastructure and Services System Admin Software Config Accounts Metadata Data Profiling Data Extracts, Matching Web Services Training & Support 5

6 IRS Analytic Data Environment Compliance Data Warehouse (CDW) Core Database Servers (Sybase IQ, Oracle, SQL Server) Shared Storage (>2PB) (DB, Backup, Staging, User) Application/Web Servers (SAS, R, Hyperion) IRS Network Users & Projects Systems & Applications Analytic Sandboxes Other Tools 6

7 Scale (Volume) 1600 Data Size (Terabytes) 7000 Average Daily Queries Third-Party Tools Web-Based Not all infrastructure/service costs are constant in scale Massively large environments can have asymmetric challenges Systems & Storage Management ETL & Database Administration Metadata & Web Services Security Audit and Monitring Tools, Training, & Support Analytic Sandboxes 7

8 Challenges with Scale I/O bottlenecks when data are off-loaded for analytics Single biggest problem for users in massively large environments Strategy: Maximize in-database analytics where possible Finding the optimal mix of ETL tools and techniques This is still where data warehousing costs are highest Strategy: Stay nimble and avoid one-size-fits-all solution Choosing the right database technology Is it performance or scale that s really needed? CDW is largest database in the IRS and still uses columar DB Strategy: Maximize performance for users at smallest O&M cost Storage management Different approach needed in user-based analytic environment Strategy: t Partition file systems based on user intensity it 8

9 terly Monthly Weekly Daily Annual Quart Data Arrival Rate Timeliness (Velocity) Ingest-Release Latency Data arrival rates are different from data delivery rates Minimzing this difference is inherently an ETL problem Data Extract/ Feed Validation/ Integration/ Preprocessinprocessing Post- Analysis/ Modeling Interpretation/ Action 9

10 Challenges with Velocity Larger the data size, longer the processing time Let P ij and S ij = processing time and size of data set i with frequency j, ij = 1, 2,, n The problem is argmin θ ij (P S) ij + ε ij Processing time varies with scale (and complexity) Disturbances ε ij are unavoidable (e.g., server maintenance) Data may require validation, standardization, and cleaning No two data sets are the same Structured vs. unstructured data What is impact of frequent schema changes on data delivery times for structured data? Do skills exist for processing unstructured data at any speed? 10

11 Heterogeneity (Variety) Sources of IRS Data Types of IRS Data Source Systems and Data Formats Taxpayers Employers Preparers Banks Brokers Non-Profits Interagency Fed/State Treaty Partners Intermediaries Forms Schedules Worksheets Attachments Images Correspondence Transactions Phone Calls Notices Transcripts Mainframe Unix/Linux Windows Databases VSAM Flat Files Applications DB tables Fixed format Hierarchical Delimited Packed decimal XML Plain text Overwhelming majority of IRS data are still structured Most transaction systems are still file-based Challenge: skills needed to parse and analyze text Information extraction and entity resolution techniques (NLP) 11

12 Metadata and Information Quality Searchable Metadata Framework and Strategy Simple reference model is used to guide consisteny of searchable artifacts Combination of system, contextual, and application attributes Controlled vocabulary for key descriptive elements Columns Columns w ith Metadata Strategy favors basic discoverability rather than systematized collections Data for analytics must be searchable, understandable, and semantically consistent Metadata is the nucleus of any data quality strategy Trust and confidence in data should be invariant to scale 12

13 Metadata and Information Quality Stages of Metadata Collection Database Flat File VSAM Extract Transform Load Validate Staging DW Roll-Ups Query, Analys sis, Reportin g Source Systems Source Metadata ETL/T Metadata Data Model Metadata Report Metadata Central Metadata Repository Metadata are collected at each stage of the data supply chain 13

14 Metadata and Information Quality System Metadata Physical properties, data movement, ETL/T, and workflow artifacts Contextual Metadata Attributes, references, and other searchable content Application Metadata Context dependent logic, conditional rules, and dynamic processing Source System Characteristics System properties File or table names Data element names and definitons Data types Transformation rules Cross-references references Target System Properties Table names Column names Data types Indexes Partitions or table spaces Data Attributes Authoritative system Data element name and definiton Availability Data type Join paths Legacy source reference User reviews Links to context-dependent data Publishing Standards Web-based Standard format Hierarchical and free-form search Web-Based Logic Reports and roll-ups Lookup tables URLs and other links External communication Profiling Frequencies Statistical distributions Trend analysis Geographic maps Reviews User ID Table/column reference Feedback 14

15 Techniques used by IRS analysts Workforce Skills Regression-based methods (GLM, logisitic, quantile, non-linear, proportional hazards) Social network analysis, graph theory Machine learning (neural networks, SVMs, genetic algorithms) Multivariate statistical methods (discriminant analysis, clustering, density estimation, factor analysis) Simulation (Monte Carlo, MCMC, agent-based modeling) Decision trees (CART, CHAID, C5, hybrids) Bayes rules and other classifiers Variance estimation with complex samples 15

16 Workforce Skills Analysts: Use of advanced SQL techniques to avoid off-loading data for analytics (in-database dtb computing) Understanding and leveraging Open Source tools IT Staff: Literacy in non-traditional computing architectures Support for Open Source tools and analytic databases Ability to quickly build and deploy analytic sandboxes This is different from typical BI/report/dashboard environments Emphasis on algorithms, not just information distribution Key is multi-disciplinary skills Nexus of statistics, computer science, economics, IT 16

17 Data Privacy and Security IRS analytics are done behind the firewall but data still moves Data off-loaded to laptops, servers, sandboxes External access (Treasury, Congress, universities) Permissions management in shared disk environment Gets more complex with more users and data Security trade-offs and challenges Impact of system- and application-level policy changes How much continuous monitoring and auditing? FISMA and the documentation dilemma Relationship between encryption and performance 17

CDW DATA QUALITY INITIATIVE

CDW DATA QUALITY INITIATIVE Loading Metadata to the IRS Compliance Data Warehouse (CDW) Website: From Spreadsheet to Database Using SAS Macros and PROC SQL Robin Rappaport, IRS Office of Research, Washington, DC Jeff Butler, IRS

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

EII - ETL - EAI What, Why, and How!

EII - ETL - EAI What, Why, and How! IBM Software Group EII - ETL - EAI What, Why, and How! Tom Wu 巫 介 唐, [email protected] Information Integrator Advocate Software Group IBM Taiwan 2005 IBM Corporation Agenda Data Integration Challenges and

More information

Chapter 5. Learning Objectives. DW Development and ETL

Chapter 5. Learning Objectives. DW Development and ETL Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)

More information

High-Volume Data Warehousing in Centerprise. Product Datasheet

High-Volume Data Warehousing in Centerprise. Product Datasheet High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence

More information

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box) SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box) SQL Server White Paper Published: January 2012 Applies to: SQL Server 2012 Summary: This paper explains the different ways in which databases

More information

NEWLY EMERGING BEST PRACTICES FOR BIG DATA

NEWLY EMERGING BEST PRACTICES FOR BIG DATA 2000-2012 Kimball Group. All rights reserved. Page 1 NEWLY EMERGING BEST PRACTICES FOR BIG DATA Ralph Kimball Informatica October 2012 Ralph Kimball Big is Being Monetized Big data is the second era of

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

BUSINESSOBJECTS DATA INTEGRATOR

BUSINESSOBJECTS DATA INTEGRATOR PRODUCTS BUSINESSOBJECTS DATA INTEGRATOR IT Benefits Correlate and integrate data from any source Efficiently design a bulletproof data integration process Improve data quality Move data in real time and

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUILDING OLAP TOOLS OVER LARGE DATABASES BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,

More information

<Insert Picture Here> Oracle Retail Data Model Overview

<Insert Picture Here> Oracle Retail Data Model Overview Oracle Retail Data Model Overview The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco Decoding the Big Data Deluge a Virtual Approach Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco High-volume, velocity and variety information assets that demand

More information

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server Extending Hyperion BI with the Oracle BI Server Mark Ostroff Sr. BI Solutions Consultant Agenda Hyperion BI versus Hyperion BI with OBI Server Benefits of using Hyperion BI with the

More information

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica Menachem Brouk, Regional Director - EMEA Agenda» Attunity update» Solutions for : 1. Big Data Analytics 2. Live Reporting

More information

BENEFITS OF AUTOMATING DATA WAREHOUSING

BENEFITS OF AUTOMATING DATA WAREHOUSING BENEFITS OF AUTOMATING DATA WAREHOUSING Introduction...2 The Process...2 The Problem...2 The Solution...2 Benefits...2 Background...3 Automating the Data Warehouse with UC4 Workload Automation Suite...3

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs Database Systems Journal vol. III, no. 1/2012 41 SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs 1 Silvia BOLOHAN, 2

More information

SAP SE - Legal Requirements and Requirements

SAP SE - Legal Requirements and Requirements Finding the signals in the noise Niklas Packendorff @packendorff Solution Expert Analytics & Data Platform Legal disclaimer The information in this presentation is confidential and proprietary to SAP and

More information

Enterprise Data Integration The Foundation for Business Insight

Enterprise Data Integration The Foundation for Business Insight Enterprise Data Integration The Foundation for Business Insight Data Hubs Data Migration Data Warehousing Data Synchronization Business Activity Monitoring Ingredients for Success Enterprise Visibility

More information

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES ORACLE TAX ANALYTICS KEY FEATURES A set of comprehensive and compatible BI Applications. Advanced insight into tax performance Built on World Class Oracle s Database and BI Technology Design after the

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

An Architectural Review Of Integrating MicroStrategy With SAP BW

An Architectural Review Of Integrating MicroStrategy With SAP BW An Architectural Review Of Integrating MicroStrategy With SAP BW Manish Jindal MicroStrategy Principal HCL Objectives To understand how MicroStrategy integrates with SAP BW Discuss various Design Options

More information

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III White Paper Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III Performance of Microsoft SQL Server 2008 BI and D/W Solutions on Dell PowerEdge

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Making Sense of Big Data in Insurance

Making Sense of Big Data in Insurance Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

Lavastorm Resolution Center 2.2 Release Frequently Asked Questions

Lavastorm Resolution Center 2.2 Release Frequently Asked Questions Lavastorm Resolution Center 2.2 Release Frequently Asked Questions Software Description What is Lavastorm Resolution Center 2.2? Lavastorm Resolution Center (LRC) is a flexible business improvement management

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Importance or the Role of Data Warehousing and Data Mining in Business Applications Journal of The International Association of Advanced Technology and Science Importance or the Role of Data Warehousing and Data Mining in Business Applications ATUL ARORA ANKIT MALIK Abstract Information

More information

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006 Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements

More information

Putting Apache Kafka to Use!

Putting Apache Kafka to Use! Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Informatica ILM Archive and Application Retirement

Informatica ILM Archive and Application Retirement Informatica ILM Archive and Application Retirement Thierry AUDOT Technical Manager EMEA 26 th September 2012 1 Live Archiving What are key users pain points? My reports take forever to run! I need all

More information

Fusion Applications Overview of Business Intelligence and Reporting components

Fusion Applications Overview of Business Intelligence and Reporting components Fusion Applications Overview of Business Intelligence and Reporting components This document briefly lists the components, their common acronyms and the functionality that they bring to Fusion Applications.

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013 SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase

More information

Tax Fraud in Increasing

Tax Fraud in Increasing Preventing Fraud with Through Analytics Satya Bhamidipati Data Scientist Business Analytics Product Group Copyright 2014 Oracle and/or its affiliates. All rights reserved. 2 Tax Fraud in Increasing 27%

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS PRODUCT FACTS & FEATURES KEY FEATURES Comprehensive, best-of-breed capabilities 100 percent thin client interface Intelligence across multiple

More information

Getting Value from Big Data with Analytics

Getting Value from Big Data with Analytics Getting Value from Big Data with Analytics Edward Roske, CEO Oracle ACE Director [email protected] BLOG: LookSmarter.blogspot.com WEBSITE: www.interrel.com TWITTER: Eroske About interrel Reigning Oracle

More information

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS Oracle Fusion editions of Oracle's Hyperion performance management products are currently available only on Microsoft Windows server platforms. The following is intended to outline our general product

More information

Improving your Data Warehouse s IQ

Improving your Data Warehouse s IQ Improving your Data Warehouse s IQ Derek Strauss Gavroshe USA, Inc. Outline Data quality for second generation data warehouses DQ tool functionality categories and the data quality process Data model types

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

Business-driven governance: Managing policies for data retention

Business-driven governance: Managing policies for data retention August 2013 Business-driven governance: Managing policies for data retention Establish and support enterprise data retention policies for ENTER» Table of contents 3 4 5 Step 1: Identify the complete business

More information

Overview. Edvantage Security

Overview. Edvantage Security Overview West Virginia Department of Education (WVDE) is required by law to collect and store student and educator records, and takes seriously its obligations to secure information systems and protect

More information

BUSINESSOBJECTS DATA INTEGRATOR

BUSINESSOBJECTS DATA INTEGRATOR PRODUCTS BUSINESSOBJECTS DATA INTEGRATOR IT Benefits Correlate and integrate data from any source Efficiently design a bulletproof data integration process Accelerate time to market Move data in real time

More information

Cloud Ready Data: Speeding Your Journey to the Cloud

Cloud Ready Data: Speeding Your Journey to the Cloud Cloud Ready Data: Speeding Your Journey to the Cloud Hybrid Cloud first Born to the cloud 3 Am I part of a Cloud First organization? Am I part of a Cloud First agency? The cloud applications questions

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: +33 15 7602 081 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration

More information

III JORNADAS DE DATA MINING

III JORNADAS DE DATA MINING III JORNADAS DE DATA MINING EN EL MARCO DE LA MAESTRÍA EN DATA MINING DE LA UNIVERSIDAD AUSTRAL PRESENTACIÓN TECNOLÓGICA IBM Alan Schcolnik, Cognos Technical Sales Team Leader, IBM Software Group. IAE

More information

Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group. Tuesday June 12 1:00-2:15

Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group. Tuesday June 12 1:00-2:15 Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group Tuesday June 12 1:00-2:15 Service Oriented Architecture and the DBA What is Service Oriented Architecture (SOA)

More information

Oracle Business Intelligence Foundation Suite 11g Essentials Exam Study Guide

Oracle Business Intelligence Foundation Suite 11g Essentials Exam Study Guide Oracle Business Intelligence Foundation Suite 11g Essentials Exam Study Guide Joshua Jeyasingh Senior Technical Account Manager WW A&C Partner Enablement Objective & Audience Objective Help you prepare

More information

Advanced Analytics for Audit Case Selection

Advanced Analytics for Audit Case Selection Advanced Analytics for Audit Case Selection New York State Capitol Building Albany, New York Presented by: Tim Gardinier, Manager, Data Processing Services Audit Case Selection Putting a scientific approach

More information

Industry Models and Information Server

Industry Models and Information Server 1 September 2013 Industry Models and Information Server Data Models, Metadata Management and Data Governance Gary Thompson ([email protected] ) Information Management Disclaimer. All rights reserved.

More information

Oracle Data Integrator 11g: Integration and Administration

Oracle Data Integrator 11g: Integration and Administration Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 4108 4709 Oracle Data Integrator 11g: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information

Oracle Architecture, Concepts & Facilities

Oracle Architecture, Concepts & Facilities COURSE CODE: COURSE TITLE: CURRENCY: AUDIENCE: ORAACF Oracle Architecture, Concepts & Facilities 10g & 11g Database administrators, system administrators and developers PREREQUISITES: At least 1 year of

More information

Leveraging Machine Data to Deliver New Insights for Business Analytics

Leveraging Machine Data to Deliver New Insights for Business Analytics Copyright 2015 Splunk Inc. Leveraging Machine Data to Deliver New Insights for Business Analytics Rahul Deshmukh Director, Solutions Marketing Jason Fedota Regional Sales Manager Safe Harbor Statement

More information

Corralling Data for Business Insights. The difference data relationship management can make. Part of the Rolta Managed Services Series

Corralling Data for Business Insights. The difference data relationship management can make. Part of the Rolta Managed Services Series Corralling Data for Business Insights The difference data relationship management can make Part of the Rolta Managed Services Series Data Relationship Management Data inconsistencies plague many organizations.

More information

Data Warehousing. Jens Teubner, TU Dortmund [email protected]. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1 Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund [email protected] Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Exploring the Synergistic Relationships Between BPC, BW and HANA

Exploring the Synergistic Relationships Between BPC, BW and HANA September 9 11, 2013 Anaheim, California Exploring the Synergistic Relationships Between, BW and HANA Sheldon Edelstein SAP Database and Solution Management Learning Points SAP Business Planning and Consolidation

More information

Enterprise Information Management and Business Intelligence Initiatives at the Federal Reserve. XXXIV Meeting on Central Bank Systematization

Enterprise Information Management and Business Intelligence Initiatives at the Federal Reserve. XXXIV Meeting on Central Bank Systematization Enterprise Information Management and Business Intelligence Initiatives at the Federal Reserve Kenneth Buckley Associate Director Division of Reserve Bank Operations and Payment Systems XXXIV Meeting on

More information

Data Integration and ETL with Oracle Warehouse Builder NEW

Data Integration and ETL with Oracle Warehouse Builder NEW Oracle University Appelez-nous: +33 (0) 1 57 60 20 81 Data Integration and ETL with Oracle Warehouse Builder NEW Durée: 5 Jours Description In this 5-day hands-on course, students explore the concepts,

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]

More information

BBBT Podcast Transcript

BBBT Podcast Transcript BBBT Podcast Transcript About the BBBT Vendor: The Boulder Brain Trust, or BBBT, was founded in 2006 by Claudia Imhoff. Its mission is to leverage business intelligence for industry vendors, for its members,

More information

Data Mart/Warehouse: Progress and Vision

Data Mart/Warehouse: Progress and Vision Data Mart/Warehouse: Progress and Vision Institutional Research and Planning University Information Systems What is data warehousing? A data warehouse: is a single place that contains complete, accurate

More information

Information and Decision Sciences (IDS)

Information and Decision Sciences (IDS) University of Illinois at Chicago 1 Information and Decision Sciences (IDS) Courses IDS 400. Advanced Business Programming Using Java. 0-4 Visual extended business language capabilities, including creating

More information

Introduction to Glossary Business

Introduction to Glossary Business Introduction to Glossary Business B T O Metadata Primer Business Metadata Business rules, Definitions, Terminology, Glossaries, Algorithms and Lineage using business language Audience: Business users Technical

More information

OWB Users, Enter The New ODI World

OWB Users, Enter The New ODI World OWB Users, Enter The New ODI World Kulvinder Hari Oracle Introduction Oracle Data Integrator (ODI) is a best-of-breed data integration platform focused on fast bulk data movement and handling complex data

More information

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features 1 Oracle SQL Developer 3.0: Overview and New Features Sue Harper Senior Principal Product Manager The following is intended to outline our general product direction. It is intended

More information

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015 Bringing Strategy to Life Using an Intelligent Platform to Become Ready Informatica Government Summit April 23, 2015 Informatica Solutions Overview Power the -Ready Enterprise Government Imperatives Improve

More information

Reverse Engineering in Data Integration Software

Reverse Engineering in Data Integration Software Database Systems Journal vol. IV, no. 1/2013 11 Reverse Engineering in Data Integration Software Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] Integrated applications

More information

The Future of Business Analytics is Now! 2013 IBM Corporation

The Future of Business Analytics is Now! 2013 IBM Corporation The Future of Business Analytics is Now! 1 The pressures on organizations are at a point where analytics has evolved from a business initiative to a BUSINESS IMPERATIVE More organization are using analytics

More information

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should

More information

Oracle Business Intelligence 11g Business Dashboard Management

Oracle Business Intelligence 11g Business Dashboard Management Oracle Business Intelligence 11g Business Dashboard Management Thomas Oestreich Chief EPM STrategist Tool Proliferation is Inefficient and Costly Disconnected Systems; Competing Analytic

More information

Integrating data in the Information System An Open Source approach

Integrating data in the Information System An Open Source approach WHITE PAPER Integrating data in the Information System An Open Source approach Table of Contents Most IT Deployments Require Integration... 3 Scenario 1: Data Migration... 4 Scenario 2: e-business Application

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information