Eliminating Required Waste and Non-Value Added Processing in Data Warehousing a Six Sigma LEAN Perspective

Similar documents

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

MDM and Data Warehousing Complement Each Other

Processing of Insurance Returns. An EMC Lean Six Sigma Project. Author: Aidan Trindle. Editor: Mohan Mahadevan

LEAN 101 CRASH COURSE. Presented by Jacob McKenna and Seaver Woolfok

BENEFITS OF AUTOMATING DATA WAREHOUSING

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Understanding Data Warehouse Needs Session #1568 Trends, Issues and Capabilities

Purchasing Success for the Service Sector: Using Lean & Six Sigma.

Data Quality Assessment. Approach

Oracle Warehouse Builder 10g

Service Oriented Data Management

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Business Challenges. Customer retention and new customer acquisition (customer relationship management)

Enabling Data Quality

Waterloo Agile Lean P2P Group

Data Warehousing and Data Mining

LearnFromGuru Polish your knowledge

Retail POS Data Analytics Using MS Bi Tools. Business Intelligence White Paper

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

BEGINNING THE LEAN IMPROVEMENT JOURNEY IN THE CLINICAL LABORATORY

Microsoft Data Warehouse in Depth

Sterling Business Intelligence

Advancing Your Business Analysis Career Intermediate and Senior Role Descriptions

Data Integration for the Real Time Enterprise

Building an Effective Data Warehouse Architecture James Serra

Netezza S's. Robert Hartevelt 31 October IBM Corporation IBM Corporation IBM Corporation

Compliance Management EFFECTIVE MULTI-CUSTODIAL COMPLIANCE AND SALES SURVEILLANCE

DEMAND SMARTER, FASTER, EASIER BUSINESS INTELLIGENCE

POLAR IT SERVICES. Business Intelligence Project Methodology

DB2 for z/os Backup and Recovery: Basics, Best Practices, and What's New

James Serra Sr BI Architect

College of Engineering, Technology, and Computer Science

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Speeding ETL Processing in Data Warehouses White Paper

Seeking Data Quality. Using Agile Methods to Test a Data Warehouse

The Power of Two: Combining Lean Six Sigma and BPM

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Enterprise Data Management THE FOUNDATION OF HIGH-PERFORMING FINANCIAL ORGANIZATIONS

Lean and Six Sigma Healthcare Fad or Reality. Vince D Mello President

CHAPTER 4: BUSINESS ANALYTICS

Testing Big data is one of the biggest

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Data Warehouse Overview. Srini Rengarajan

LEARNING SOLUTIONS website milner.com/learning phone

Improve SQL Performance with BMC Software

SQL Server 2012 Business Intelligence Boot Camp

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

SimCorp Solution Guide

Expert Oracle. Database Architecture. Techniques and Solutions. 10gr, and 11g Programming. Oracle Database 9/, Second Edition.

IBM WebSphere DataStage Online training from Yes-M Systems

Request for Proposal for Application Development and Maintenance Services for XML Store platforms

Innovative technology for big data analytics

SQL Server 2012 End-to-End Business Intelligence Workshop

Green Migration from Oracle

CHAPTER 5: BUSINESS ANALYTICS

Virtual Operational Data Store (VODS) A Syncordant White Paper

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Data warehousing with PostgreSQL

Universally Accepted Lean Six Sigma Body of Knowledge for Green Belts

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

MicroStrategy Course Catalog

For more information about UC4 products please visit Automation Within, Around, and Beyond Oracle E-Business Suite

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Cognos Performance Troubleshooting

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

Attunity Better Data Movement For The Internet Of Things

Data Warehouse (DW) Maturity Assessment Questionnaire

Implementing a Data Warehouse with Microsoft SQL Server 2014

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Lean manufacturing in the age of the Industrial Internet

Designing a Dimensional Model

Extensibility of Oracle BI Applications

B.Sc (Computer Science) Database Management Systems UNIT-V

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

Optimizing Inventory in Today s Challenging Environment Maximo Monday August 11, 2008

PeopleSoft DDL & DDL Management

Toronto 26 th SAP BI. Leap Forward with SAP

times, lower costs, improved quality, and increased customer satisfaction. ABSTRACT

Microsoft Dynamics AX 2012 R2 New Features*

SQL Performance for a Big Data 22 Billion row data warehouse

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Drive Quality: Get the Skinny on Lean

BODY OF KNOWLEDGE CERTIFIED SIX SIGMA YELLOW BELT

Louisiana Tech University Lean Manufacturing Courses

Transcription:

Eliminating Required Waste and Non-Value Added Processing in Data Warehousing a Six Sigma LEAN Perspective John G. McManus Bank of America September 1, 2009

2 Last Night s Festivities

3 Continued

4 And More

5 Definitely a first for the CIO Finance Summit

John G. McManus Bank of America Senior Vice President Global Wealth and Investment Management Technology Responsible for: data warehouse/datamart, sales reporting, workflow, document management and incentive compensation applications Six Sigma Green Belt certified Formerly director of data warehouse and business intelligence systems at AT&T Broadband Served as officer in the US Army and Army Reserve for 22 years Currently supporting transition integration of the Merrill Lynch and Bank of America brokerage businesses and technology 6

Overview of the Presentation A quick look at Bank of America Six Sigma and LEAN fundamentals Six Sigma & LEAN perspective on data warehousing lifecycle processes Summary Questions 7

Bank of America Annual Report 2008 (does not include Merrill Lynch) Annual revenue of $72B Annual net income $4B 6,100 Retail Banking Outlets 18,700 ATMs 59 million consumer and small business banking relationships 29 million active on-line banking users Bank of America serves more than million consumer and small business relationships Global Wealth and Investment Management - more than 2 million individual and institutional clients worldwide 8

Six Sigma at Bank of America Focuses on the reduction/elimination of waste in our processes Focuses on the reduction/elimination of defects Reduces variation using statistical tools Provides ongoing process control and continuous improvement Start with the customer perspective Begin with the End in Mind Stephen Covey The end for a data warehouse is data loaded and available for use 9

Sigma Level - Metric Measures Defects per Million Opportunities Sigma Level Defects per Million Opportunities 2 308,567.0 3 66,807.0 4 6,210.0 5 233.0 6 3.4 10

LEAN Six Sigma focuses on getting variation out of our processes LEAN focuses on waste elimination, leveraging common tools and techniques from Six Sigma LEAN Tools include: End to End System Map Physical Process Map Time Value Map Spaghetti Chart Product Process Flow Analysis 11

Seven Types of Waste Defects Incomplete or corrupt source system feeds Low chip yield causes Xbox shortage for Christmas 2005 Overproduction Excessive storage space, redundant data Transportation Moving data multiple times though the information pipeline Waiting Processes that don t kick off until all source files are in hand Inventory Wasted storage space, excessive history Motion Putting the bun on the burger only to remove it to put on the pickles Processing (too much) Three nested views on top of an aggregate table in a datamart 12

Defining Value Value Added Activity An activity that CHANGES the size, shape, fit, form or function of material or information (for the 1st time) to meet customer requirements. Transformation of dimensional data to load to a fact table Non-Value Added Activity All other activities that take time or resources or does not satisfy customer requirements. Partitioning of data Indexing Creation of aggregate tables Denormalization 13

The Activity of the Product The only four things a product can be doing Storage (White Space) Transportation Inspection Processing Value added Non-value added By the way what is our Product within data warehousing? 14

The Activity of the Equipment Value Add and Non-Value Add activity Run Automated processing Manual processing Load Unload Setup Replenish Complete Idle Maintain 15

Typical Data Warehouse Development Tasks - Categorized 16 Procure and install multi-terabyte database/storage system Develop logical data model - VAP Develop physical data models Required Waste Calculate storage requirements for all objects Required Waste Design database partitions / strategy Required Waste Design indexes Required Waste Design aggregates and materialized views Required Waste Create DDL - VAP (for data tables all else is req d waste) Create physical data tables including partitions Required Waste Create indexes Required Waste Create aggregates and materialized views Required Waste Create users, roles, entitlements - VAP Load data - VAP Create datamarts and cubes Required Waste Performance tune loads, extracts and reports Required Waste

Actaul Results - DDL Migration DDL from traditional RDBMS migrated to data warehouse appliance within 1 day. Process removed 95+% of content from existing DDL files including tablespace, partition, and index declarations. 17 BEFORE 1821 lines of SQL CREATE TABLE CMVFACT.FCT_MKV_CUST_BAL ( PER_CCYYMM NUMBER(6) NOT NULL, SK_CIF NUMBER(15) NOT NULL, SK_PR_L5 NUMBER(8) NOT NULL, BAL_CD NUMBER(5) NOT NULL, BUS_SEG_L3_ID NUMBER(4) NOT NULL, SEG2_CD NUMBER(2) NOT NULL, ISO_CRNCY_CD CHAR(3) NOT NULL, CO_CST_CTR_ID CHAR(10) NOT NULL, CAL_MON_DAY_NR NUMBER(2) NOT NULL, CAL_MON_BUS_DAY_NR NUMBER(2) NOT NULL, BAL_AM NUMBER(15,2) NOT NULL, BAL_ORIG_CRNCY_AM NUMBER(18,2) NOT NULL, SRCE_SYS_CD NUMBER(4) NULL ) TABLESPACE CMVFACT NOLOGGING PCTFREE 0 PCTUSED 90 INITRANS 1 MAXTRANS 255 STORAGE(BUFFER_POOL DEFAULT) NOPARALLEL NOCACHE PARTITION BY RANGE(PER_CCYYMM) (PARTITION P200205 VALUES LESS THAN (200206) STORAGE(FREELISTS 1 FREELIST GROUPS 1), PARTITION P200206 VALUES LESS THAN (200207) STORAGE(FREELISTS 1 FREELIST GROUPS 1), PARTITION P200207 VALUES LESS THAN (200208) STORAGE(FREELISTS 1 FREELIST GROUPS 1), PARTITION P200208 VALUES LESS THAN (200209) STORAGE(FREELISTS 1 FREELIST GROUPS 1), After 17 lines of SQL CREATE TABLE FCT_MKV_CUST_BAL ( PER_CCYYMM integer NOT NULL, SK_CIF bigint NOT NULL, SK_PR_L5 integer NOT NULL, BAL_CD integer NOT NULL, BUS_SEG_L3_ID integer NOT NULL, SEG2_CD integer NOT NULL, ISO_CRNCY_CD CHAR(3) NOT NULL, CO_CST_CTR_ID CHAR(10) NOT NULL, CAL_MON_DAY_NR integer NOT NULL, CAL_MON_BUS_DAY_NR integer NOT NULL, BAL_AM numeric(15,2) NOT NULL, BAL_ORIG_CRNCY_AM numeric(18,2) NOT NULL, SRCE_SYS_CD integer NULL ) distribute on (sk_cif);

Typical Data Warehouse Load Tasks Drop indexes Drop constraints Drop aggregates Sort data before loading Transform and load data Look at why a job that ran in 2 minutes yesterday now runs in 2 hours Add storage to accommodate data growth Regenerate indexes Add storage to accommodate data growth Regenerate constraints Recreate aggregates Regenerate materialized views Recompute database statistics Run reconciliation routines 18

Time Value Map Inputs Data Warehouse Load Tasks (time in minutes) Process Step Description Store Trans Inspect NV Process VA Process 001 Migrate Data from Source Systems 120 002 Reconcile Data Back to Source 20 003 Preprocess, sort and prep data 30 004 Drop Indexes 5 005 Drop Constraints 1 006 Drop Aggregates 2 007 Drop Materialized Views 2 008 Load Data 30 009 Create Constraints 180 010 Create Indexes 90 011 Create Materialized Views 60 012 Create Aggregates 120 013 Gather Statistics 300 Actual times for weekly datamart load 19

Time Value Map Categorize each process step according to the Activity of the Product Identify value added vs. non-value added processing White space is bad and needs to be eliminated 20 The largest and most difficult waste to find is time and you can never get it back Henry Ford KEY: Value added processing Non-value added processing

Actual Results 4TB Production Timeline - Monthly Production Timeline with traditional RDBMS / hardware CMV Timeline (assume batch begin 17th calendar day 5:00 PM PST Load Stage Dimension/FTP Fact Build Aggregation Index/Swap Certification 17th 18th 19th 20th 21st 22nd 23rd 24th 25th 26th Calendar Days Production Timeline using data warehouse appliance CMV Timeline (assume batch begin 17th calendar day 5:00 PM PST Load Stage Dimension/FTP Fact Build Aggregation Certification 17th 18th 19th 20th 21st Calendar Days 22nd 23rd 24th 8 DAYS removed from monthly batch process or 96 days of increased data availability! 25th 26th 21

22 Information Sprawl Tracking the Activity of the Product (in LEAN Terms E2E Process or Spaghetti Chart)

Various Six Sigma Methods for Determining Root Cause Failure Modes and Effects Analysis (FMEA) Focuses priority on the most critical areas Cause and Effect Diagram (Fishbone) - feeds FMEA Shows influences on a process potential failures Load failed why? Ran out of tablespace Partition not defined for new month Incomplete data file Data anomalies Bad ETL logic The 5 Why s Ask Why? 5 times 23

The 5 Why s Get to the Root Cause for Failures Hey Boss, we missed our SLAs for the users again today Why? The job that always runs in 2 minutes ran for 2 hours Why? We think the database chose a different execution path Why? The Statistics might have been out of date Why do we need statistics? To enable the database to make intelligent choices about which indexes to use and how to join tables Why do we need indexes? Because traditional RDBMs have significant design flaws Why? You ll have to ask them 24

Why do we do it? Creation of indexes Queries and loads would perform abysmally without them Creation of aggregates Users want data summarized in many instances Queries and reports would perform abysmally without them Creation of data marts Denormalize the data for reporting efficiencies Need to offload impact of reporting from data warehouse Specialized transformations DDL to support required waste 95% of DDL has nothing to do with the data itself 25

Today s Challenges Larger data volumes and increased data retention Increased reliance on data by the business Smaller and smaller maintenance/load windows Increased SLAs, support for international availability windows Highly sophisticated users with a thirst for data Increased risk, compliance and regulatory oversight of data = 26

What if we Could Eliminate the Need for Indexes? Eliminate storage needs for indexes, usually 2x 3x data needs Eliminate time consuming index regeneration usually longer than data load times Eliminates on-going tuning/maintenance of indexes less administrative DBA support needed Deliver data faster get it in the hands of decision makers quicker Enable user self-empowerment unlock the data Increase reliability less places to fail while loading 27

What if we Could Eliminate the Need for Aggregates? Eliminate storage needs for aggregation tables and associated indexes Eliminate time consuming development and testing of aggregation routines could be man months Eliminates time consuming daily/weekly aggregate regeneration Data is more current no need to limit users to weekly or monthly views only [Artificial Constraint] Data is delivered faster allowing allow earlier access by the users Increase reliability less places to fail many aggregation routines have flawed logic the users just don t know it 28

What If We Could Eliminate The Need For Datamarts? Eliminate total cost of ownership for datamart hardware and software licenses Eliminate development, maintenance and production costs of marts Single source of the truth Data is delivered faster allowing allow earlier access by the users 29

Summary On average, data warehousing is >90% required waste or non-value added processing The cost of this waste translates to: Unacceptably long time to market Unnecessary hardware and software license costs Terabytes of wasted storage Elongated development data load cycles Longer periods of data unavailability Stale data Poorly performing loads and queries Excessive administrative costs 30

Final Thought Henry Ford said: Time waste differs from material in that there can be no salvage. The easiest of all wastes, and the hardest to correct, is the waste of time, because the waste of time does not litter the floor like wasted material. We need to eliminate the waste in data warehousing Reclaim the time we currently waste 31

Questions? John.G.McManus@BankofAmerica.com 32