So Many Tools, So Much Data, and So Much Meta Data Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or otherwise, without the explicit written permission of the copyright owners. by Rick F. van der Lans R20/Consultancy BV Twitter: rick_vanderlans www.r20.nl Rick F. van der Lans Rick F. van der Lans is an independent consultant, lecturer, and author. He specializes in data warehousing, business intelligence, service oriented architectures, and database technology. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects in which SOA, data warehousing, and integration technology was applied. Rick van der Lans is an internationally acclaimed lecturer. He has lectured professionally for the last twenty years in many of the European and Middle East countries, the USA, South America, and in Australia. He has been invited by several major software vendors to present keynote speeches. He is the author of several books on computing, including Myths on Computing. Some of these books are available in different languages. Books such as the popular Introduction to SQL and SQL for MySQL Developers, are available in English, Dutch, Italian, Chinese, and German and are sold world wide. This year he released The SQL Guide to Ingres. As author for BeyeNetwork.com, writer of whitepapers, as chairman for the annual European Data Warehouse and Business Intelligence Conference, and as columnist for a few IT magazines, he has close contacts with many vendors. R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: Email: rick@r20.nl Twitter: http://twitter.com/rick_vanderlans LinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223 Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 2 1
What is Business Intelligence? The success of most organizations is highly dependent on the quality of their decision making The field of business intelligence focuses on supporting and possibly improving the decision making process of an organization Definition by Boris Evelson of Forrester Research: Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decisionmaking. Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 3 Too Much Data Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 4 2
Chris J. Date - 1977 An enterprise should store its operational data in an integrated database to provide the enterprise with centralized control of its operational data This is in sharp contrast to the situation that prevails in most enterprises today [1977], where typically each application has its own private files so that the data is widely dispersed. Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 5 Utopia: One Large Database Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 6 3
Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 7 Chain of Databases production databases staging area data warehouse datamarts personal data stores Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 8 4
I BM I BM I BM The Chain is a Complex Network personal data stores production databases data marts data staging area operational data store data warehouse Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 9 Real Life Architecture VSAM VSAM DB2 UDB SQL Server 2000 Operational Systems Facets Core Proxy Extension Pseudo CHF, Diabetes Women's Health others ValuTech ITS PeopleSoft SFA (Onyx) Pegasystems External Feeds ASK / Dental NASCO FEP PBM - Wellpoint Vendors - Lab [2] - Vision [2] - Chiropractic Drug Claims Intelligence & SQR SQR Reports ExStream Intelligence & SQR SAR SAR RTF Excel Spreadsheets CCMS (McKesson) SQL Server 2000 SQR SEGRA CRMS (McKesson) Query Builder/ OLAP HEDIS Baseline Assessment Tool PB App RPA Mainframe Files Medical Claims Membership Drug Claims Wellpoint Premium Capitation Provider Lookup/Dimension Files [28] RPA Database Medical Claims Membership Drug Claims - Wellpoint Claims Repository VB App Outsourced Intelligence EDW (Ingenix) Analyzers Config & Surveys Adjustments SAR Postscript Extream Extream PIP SAR SAR Analysis Services SQR Extream Postscript FAMS? (IBM) SQL Server 2000 Intelligence PBViews? ~50 MS Access DB Applications Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 10 5
Disadvantages of Data Duplication and Distribution Data latency increases Costs of storing and managing duplicate data increases Flexibility decreases Data quality decreases (potentially) Costs of data integration increases Data security more complex And many more Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 11 2011 TDWI BI Benchmark Report The average time needed to add a new data source was 8.4 weeks in 2009, 7.4 weeks in 2010, and 7.8 weeks in 2011. 33% needed more than 3 months. Developing a complex report or dashboard with about 20 dimensions, 12 measures, and 6 user access rules, took on average 6.3 weeks in 2009, 6.6 weeks in 2010, and 7 weeks in 2011. 30% of the respondents indicated they needed at least 3 months or more for such a development exercise. Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 12 6
Problems with Current DW Platforms Poor query response Can t support advanced analytics Inadequate data load speed Can t scale to large data volumes Cost of scaling up is too expensive Poorly suited to real-time or on demand workloads Current platform is a legacy we must phase out Can t support data modeling we need We need platform that supports mixed workloads Can t support large concurrent user count Inadequate high availability Inadequate support for in-memory processing Inadequate support for web services and SOA Current platform is 32-bit, and we need 64-bit Current platform is SMP, and we need MPP We need platform better suited to cloud or virtualization Can t secure the data properly Other No problems 45% 40% 39% 37% 33% 29% 23% 23% 21% 20% 19% 16% 16% 15% 14% 13% 11% 4% 3% Source: P. Russom, Next Generation Data Warehouse Platforms, TDWI Best Practices Report, fourth quarter 2009. Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 13 New Forms of The BI New BI Agile BI 360 reporting Exploratory analysis Operational BI Deep analytics Big data analytics Self-service BI Semi-structured and unstructured data analytics Disposable reports And many more Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 14 7
Operational BI Supporting decision making on the operational management level Different forms Operational reporting Operational analytics Embedded analytics Exception reporting All forms need access to operational data Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 15 Big Data (More Databases?) Size Does Matter Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 16 8
Need for more Database Power Financial Organization: 1000 users concurrently 50-80 queries concurrently 100 Terabytes data warehouse Data latency 30 minutes Trend towards data normalization (each fact only once) Technical/engineering company: 40 Terabytes data warehouse 4 Petabytes of I/O per day Peek: more than 100 queries concurrently Business critical data warehouse Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 17 Data Virtualization to the Rescue production application reporting & analytics SOA SQL statement SQL statement SQL statement Data Virtualization SQL statement Server Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 18 9
Too Many Tools Reporting Tools Executive Reporting Tools OLAP Tools BAM/KPI/Dashboarding Tools Spreadsheets Data Visualization Tools Geo Visualization Tools Analytical Tools Predictive Modeling Tools Forecasting Tools Optimization Tools (Operations Research) Statistical Analysis Tools Data Mining Tools Text Mining Tools Data Discovery/Exploitation Tools Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 19 And Even More Tools Data Integration Tools ETL ELT Replication Data Virtualization Data Services (ESB) Data StorageTools SQL Databases NoSQL Databases Cubes multi-dimensional Data Warehouse Appliances In-memory Databases Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 20 10
Gartner Magic Quadrant 2012 for BI Platforms Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 21 Is IT (BICC) losing control? Gartner: By 2012, business units will control Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 22 at least 40% of the total budget for BI 11
Quote Gartner on Self-Service BI Vocal, demanding and influential business users are increasingly driving BI purchasing decisions. They re choosing easier to use data discovery tools over traditional BI platforms with or without IT's consent. Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 23 Self-Service BI Self-Service Reporting Self-Service Analytics Self-Service ETL Self-Service Cleansing Self-Service Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 24 12
BI in the Cloud production databases ODS data warehouse datamarts personal data stores Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 25 Example: BlinkLogic Headquartered in San Rafael, CA Formerly known as DataJungle Software BI solution includes dashboards, analytics, collaboration, annotation, key performance indicators (KPI) monitoring, notifications to smart phones, location intelligence, Web reports, export to Excel, and portable document format (PDF) They aim at midsize customers Most customers are not IT professionals Runs on Oracle with OLAP cubes Outsourced the client databases to UpSource Do they still exist? Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 26 13
Can Copyright be downloaded 1991-2012 R20/Consultancy from www.r20.nl B.V., The Hague, The Netherlands 27 Too Much and Too Many Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 28 14
Let s Wrap it Up! Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 29 No/Minimal Integration of Meta Data Each tool stores its own meta data Each tool creates its own meta data Effect Proliferation of meta data Duplication and no sharing of meta data No one wants a meta data warehouse project Integration of meta data is a must! Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 30 15
Current (Sad) Situation It s reducing time to market It s reducing flexibility (now when it is needed) Bad for data quality Trust in data factory diminishes Too much time is spent on technical aspects and not user requests Unneccessary costs Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 31 Recommendations Select BI platforms where meta data is integrated Select database servers that support mixed workload Super database power Simplify BI architecture Try to avoid introducing extra databases because of performance reasons Use data virtualization to introduce flexibility Copyright 1991-2012 R20/Consultancy B.V., The Hague, The Netherlands 32 16