Open Source Data Warehousing and Business Intelligence



Similar documents
Open Source Business Intelligence Intro

SOFTWARE TESTING AS A SERVICE

Pentaho BI Capability Profile

Development and Management

Improving Business Process Performance

THE COMPLETE PROJECT MANAGEMENT METHODOLOGY AND TOOLKIT

Agile Business Intelligence Collapsing BI from Months to Minutes

RESILIENT. SECURE and SOFTWARE. Requirements, Test Cases, and Testing Methods. Mark S. Merkow and Lakshmikanth Raghavan. CRC Press

Data Warehousing in the Age of Big Data

Cloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group

Data Center Storage. Hubbert Smith. Implementation, and Management »C) Cost-Effective Strategies, CRC Press J Taylor & Francis Group

Engineering Design. Software. Theory and Practice. Carlos E. Otero. CRC Press. Taylor & Francis Croup. Taylor St Francis Croup, an Informa business

SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI. May 2013

Managing Data in Motion

Ctfo MANAGEMENT SECURITY PATCH. Felicia M. Nicastro. Second Edition. CRC Press. VC#*' J Taylor & Francis Group / Boca Raton London New York

for Research and Guiding Innovation for Positive R&D Outcomes Lory Mitchell Wingate

Open Source Business Intelligence

SAS. 9.4 Guide to Software Updates. SAS Documentation

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture


Advances in Network Management

Implementation. Business-Driven IT-Wide Agile (Scrum) and Kanban (Lean) Andrew T. Pham and David K. Pham. An Action Guide for Business and IT Leaders

By Makesh Kannaiyan 8/27/2011 1

CHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT. Software Test Attacks to Break Mobile and Embedded Devices

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

The Green and Virtual Data Center

Oracle Business Intelligence Mobile

Management. Project. Software. Ashfaque Ahmed. A Process-Driven Approach. CRC Press. Taylor Si Francis Group Boca Raton London New York

ANDROID SECURITY ATTACKS AND DEFENSES ABHISHEK DUBEY I ANMOL MISRA. ( r öc) CRC Press VV J Taylor & Francis Group ^ "^ Boca Raton London New York

Implementing the Project Management Balanced Scorecard

Tuning Tips & Techniques

BUSINESS ANALYSIS FDR INTELLIGENCE

Traditional BI vs. Business Data Lake A comparison

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Governance Simplified

SQL Server Integration Services. Design Patterns. Andy Leonard. Matt Masson Tim Mitchell. Jessica M. Moss. Michelle Ufford

SQL Server Integration Services Design Patterns

Cloud Computing. Implementation, Management, and Security. John W. Rittinghouse James F. Ransome

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Extending The Value of SAP with the SAP BusinessObjects Business Intelligence Platform Product Integration Roadmap

Business Intelligence on a Budget: Open Source BI. Paul O Rorke

Information Technology and Organizational Learning

BSM 9.0 ESSENTIALS. Instructor-Led Training

Networking. Cloud and Virtual. Data Storage. Greg Schulz. Your journey. effective information services. to efficient and.

IBM 2010 校 园 蓝 色 加 油 站 之. 商 业 流 程 分 析 与 优 化 - Business Process Management and Optimization. Please input BU name. Hua Cheng chenghua@cn.ibm.

Business Intelligence

How to bridge the gap between business, IT and networks

Three Fundamental Techniques To Maximize the Value of Your Enterprise Data

Integrating SAP and non-sap data for comprehensive Business Intelligence

Data Integration Checklist

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

How To Use Sap Business Objects For Microsoft (For Microsoft) For Microsoft (For Pax) For Pax (For Sap) For Spera) For A Business Intelligence (Bio) Solution

Introducing SAP s Landscape and Data Center Innovation Platform. Phil Jackson SAP Solution Engineer

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

Enhance Performance Management Reporting

Next Generation Business Performance Management Solution

Introduction to Supply Chain Management Technologies

,., ; -,- ;., : _»/.. t,, '," 1, Mike Biere

Security Metrics. A Beginner's Guide. Caroline Wong. Mc Graw Hill. Singapore Sydney Toronto. Lisbon London Madrid Mexico City Milan New Delhi San Juan

Performance and Scalability Overview

Integrating Netezza into your existing IT landscape

How to Migrate From Existing BusinessObjects or Cognos Environments to MicroStrategy. Ani Jain January 29, 2014

Data warehouse and Business Intelligence Collateral

MDM and Data Warehousing Complement Each Other

An Oracle White Paper May 2011 BETTER INSIGHTS AND ALIGNMENT WITH BUSINESS INTELLIGENCE AND SCORECARDS

Driving Peak Performance IBM Corporation

Building Your EDI Modernization Roadmap

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Measuring Data Quality for Ongoing Improvement

Expert Oracle Application. Express Security. Scott Spendolini. Apress"

MOVING TO THE NEXT-GENERATION MEDICAL INFORMATION CALL CENTER

Open Source meets Business Intelligence Seminar Business Intelligence Winter Term 06/07

An Oracle White Paper October Oracle Data Integrator 12c New Features Overview

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

Building Dashboards for Real Business Results. Cindi Howson BIScorecard December 11, 2012

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, Denodo Technologies

Made to Fit Your Needs. SAP Solution Overview SAP Solutions for Small Businesses and Midsize Companies

Chartis RiskTech Quadrant for Data Management and BI for Risk 2013

Business Administration of Windchill PDMLink 10.0

Expert Oracle Enterprise

Open Source Business Intelligence Platforms for Engineering Education

Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform

Business Intelligence in SharePoint 2013

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Transcription:

Open Source Data Warehousing and Business Intelligence Lakshman Bulusu CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an infonna business AN AUERBACH BOOK

Contents Foreword Introduction What Does This Book Cover? Who Should Read This Book? Why a Separate Book? Acknowledgements About the Author xvii xix xxii xxii xxiii xxvii xxix Chapter 1 Introduction 1 Chapter 2 1.1 In This Chapter 1 1.2 Data Warehousing and Business Intelligence: What, Why, How, When, When Not? 2 1.2.1 Taking IT Intelligence to Its Apex 3 '1.3 Open Source DW and Bl: Much Ado about Anything-to-Everything DW and Bl, When Not, and Why So Much Ado? 5 1.3.1 Taking Business Intelligence to Its Apex: Intelligent Content for Insightful Intent 6 1.4 Summary 13 Data Warehousing and Business Intelligence: An Open Source Solution t 17 2.1 In This Chapter 17 2.2 What Is Open Source DW and Bl, and How "Open" Is This Open? 17 VII

viii Contents 2.3 What's In, What's Not: Available and Viable Options for Development and Deployment 19 2.3.1 Semantic Analytics \ 19 2.3.2 Testing for Optimizing Quality and Automation Accelerated! 19 2.3.3 Business Rules, Real-World Perspective, Social Context 19 2.3.4 Personalization Through Customizable Measures 20 2.3.5 Leveraging the Cloud for Deployment 21 2.4 The Foundations Underneath: Architecture, Technologies, and Methodologies 21 2.5 Open Source versus Proprietary DW and Bl Solutions: Key Differentiators and Integrators 27 2.6 Open Source DW and Bl: Uses and Abuses 28 2.6.1 An Intelligent Query Accelerator Using an Open Cache In, Cache Out Design 30 2.7 Summary 31 Chapter 3 Open Source DW & Bl: Successful Players and Products 33 3.1 In This Chapter 33 3.2 Open Source Data Warehousing and Business Intelligence Technology 35 3.2.1 Licensing Models Followed 35 3.2.2 Community versus Commercial Open Source 37 3.3 The Primary Vendors: Inventors and Presenters 38 3.3.1 ' Oracle: MySQL Vendor 38 3.3.2 PostgreSQL Vendor 39 3.3.3 Infobright 41 3.3.4 Pentaho: Mondrian Vendor 41 3.3.5 Jedox: Palo Vendor 42 3.3.6 EnterpriseDB Vendor 42 3.3.7 Dynamo Bl and Eigenbase: LucidDB Vendor i 42 3.3.8 GreenPlum Vendor ' 43 3.3.9 Hadoop Project 43 3.3.10 HadoopDB 44 3.3.11 Talend 44

Contents ix 3.4 3.5 3.6 3.7 The Primary Products and Tools Set: Inclusions and Exclusions 3.4.1 Open Source Databases N 3.4.2 Open Source Data Integration 3.4.3 Open Source Business Intelligence 3.4.4 Open Source Business Analytics The Primary Users: User, End-User, Customer and Intelligent Customer 3.5.1 MySQL 3.5.2 PostgreSQL 3.5.3 Mondrian Customers 3.5.4 Palo Customers 3.5.5 EnterpriseDB Customers 3.5.6 LucidDB Customers 3.5.7 Greenplum Customers - 3.5.8 Talend Customers Summary Reference; 45 45 65 70 81 89 89 89 91 91 91 91 92 92 92 93 Chapter 4 Chapter 5 Analysis, Evaluation and Selection 4.1 4.2 In This Chapter Essential Criteria for Reauirements Analvsis of an 99 Open Source DW and Bl solution 100 4.3 Key and Critical Deciding Factors in Selecting a Solution 102 4.3.1 The Selection-Action Preview 103 4.3.2 Raising your BIQ: Five Things Your Company Can Do Now 107 4.4 Evaluation Criteria for Choosing a Vendor- Specific Platform and Solution 110 4.5 The Final Pick: An Information-Driven, Customer-Centric Solution, and a Best-of-Breed Product/Platform and Solution Convergence Key Indicator Checklist 115 4.6 Summary 116 4.7 References? 118 Design and Architecture: Technologies and Methodologies by Dissection 119 5.1 In This Chapter 119 99

x Contents 5.2 The Primary Aspects of DW and Bl from a Usability Perspective: Strategic Bl, Pervasive Bl, Operational Bl, and Bl On-Demand x : 120 5.3 Design and Architecture Considerations for the Primary Bl Perspectives 121 5.3.1 The Case for Architecture as a Precedence Factor 122 5.4 Information-Centric, Business-Centric, and Customer-Centric Architecture: AThree-in-One Convergence, for Better or Worse 123 5.5 Open Source DW and Bl Architecture 125 5.5.1 Pragmatics and Design Patterns 126 5.5.2 Components 127 5.6 Why and How an Open Source Architecture Delivers a Better Enterprisewide Solution 128 5.7 Open Source Data Architecture: Under the Hood 131 5.8 Open Source Data Warehouse Architecture: Under the Hood 133 5.9 Open Source Bl Architecture: Under the Hood 136 5.10 The Vendor/Platform Product(s)/Tools(s) That Fit into the Open DW and Bl Architecture 139 5.10.1 Information Integration, Usability and Management (Across Data Sources, Applications and Business Domains) 141 5.10.2 EDW: Models to Management 143 5.10.3 Bl: Models to Interaction to Management to Strategic Business t Decision Support (via Analytics and. Visualization) 144 5.11 Best Practices: Use and Reuse 146 5.12 Summary 147 Chapter 6 Operational Bl and Open Source 149 6.1 In This Chapter 149 6.2 Why a Separate Chapter on Operational Bl and Open Source? J 150 6.3 Operational Bl by Dissection 151 6.4 Design and Architecture Considerations for Operational Bl 156 6.5 Operational Bl Data Architecture: Under the Hood 157

Contents xi 6.6 A Reusable Information Integration Model: From Real- Time to Right Time 160 6.7 Operational Bl Architecture: Under the Hood 161 6.8 Fitting Open Source Vendor/Platform Product(s)/ Tools(s) into the Operational Bl Architecture 164 6.8.1 Talend Data Integration 164 6.8.2 expressor 3.0 Community Edition 164 6.8.3 Advanced Analytics Engines for Operational Bl 165 6.8.4 Astera's Centerprise Data Integration Platform 165 6.8.5 Actuate BIRT BI Platform 165 6.8.6 JasperSoft Enterprise 166 6.8.7 Pentaho Enterprise Bl Suite 166 6.8.8 KNIME (Konstanz Information Miner) 167 6.8.9 Pervasive DataRush 167 6.8.10 Pervasive DataCloud2 167 6.9 Best Practices: Use and Reuse 167 6.10 Summary 169 Chapter 7 Development and Deployment 171 7.1 In this Chapter 171 7.2 Introduction 171 7.3 Development Options, Dissected 1 72 7.4 Deployment Options, Dissected 1 79 7.5 Integration Options, Dissected 182 t 7.6 Multiple Sources, Multiple Dimensions 185 7.7 DW and Bl Usability and Deployment: Best Solution versus Best-Fit Solution 186 7.8 Leveraging the Best-Fit Solution: Primary Considerations 188 7.9 Better, Faster, Easier as the Hitchhiker's Rule 189 7.9.1 Dynamism and Flash Real Output in Real Time in the Real World 190 7.9.2 Interactivity 190 7.10 Better Responsiveness, User Adoptability, and Transparency 191 7.11 Fitting the Vendor/Platform Product(s)/tTools(s): A Development and Deployment Standpoint 195 7.12 Best Practices: Use and Reuse 202 7.13 Summary 204

xii Contents Chapter 8 Best Practices for Data Management 205 8.1 In This Chapter, 205 8.2 Introduction ; 205 8.3 Best Fit of Open Source in EDW Implementation 206 8.4 Best Practices for Using Open Source as a Bl-Only Methodology for Data/Information Delivery 208 8.4.1 Mobile Bl and Pervasive Bl 208 8.5 Best Practices for the Data Lifecycle in a Typical EDWLifecycle 210 8.5.1 Data Quality, Data Profiling, and Data Loss Prevention Components 212 8.5.2 The Data Integration Component 219 8.6 Best Practices for the Information Lifecycle as It Moves into the Bl Lifecycle 230 8.6.1 The Data Analysis Component: The Dimensions of Data Analysis in Terms of Online Analytics vs. Predictive Analytics vs. Real-Time Analytics vs. Advanced Analytics 230 8.6.2 Data to Information Transformation and Presentation 236 8.7 Best Practices for Auditing Data Access, as It Makes Its Way via the EDW and Directly (Bypassing the EDW) to the Bl Dashboard 252 8.8 Best Practices for Using XML in the Open Source EDW/BI Space ', 254 8.9 Best Practices for a Unified Information Integrity and Security Framework 255 8.10 Object to Relational Mapping: A Necessity or Just a Convenience? 260 8.10.1 Synchrony Maintenance 260 8.10.2 Dynamic Language Interoperability 261 8.11 Summary 262 Chapter 9 Best Practices for Application Management J 265 9.1 In This Chapter 265 9.2 Introduction 266 9.3 Using Open Source as an End-to-End Solution Option: How Best a Practice Is It? 266

Contents xiii Chapter 10 9.4 Accelerating Application Development: Choice, Design, and Suitability Aspects 267 9.4.1 Visualization of Content: For Better or Best Fit 271 9.4.2 Best Practices for Autogenerating Code: A Codeless Alternative to Information Presentation 272 9.4.3 Automating Querying: Why and When 273 9.4.4 How Fine Is Fine-Grained? Drawing the Line between Representation of Data at the Lowest Level and a Best-Fit Metadata Design and Presentation 275 9.5 Best Practices for Application Integrity 275 9.5.1 Sharing Data between EDW and the Bl Tiers: Isolation or a Tightrope Methodology 278. 9.5.2 Breakthrough Bl: Self-Serviceable Bl via a Self-Adaptable Solution 279 9.5.3 Data-in, Data-Out Considerations: Data-to-lnformation I/O 280 9.5.4 Security Inside and Outside Enterprise Parameters: Best Practices for Security beyond User Authentication 280 9.6 Best Practices for Intra- and Interapplication Integration and Interaction 281 9.6.1 Continuous Activity Monitoring and Event Processing 286 t 9.6.2 Best Practices to Leverage Cloud-Based Methodologies 290 9.7 Best Practices for Creative Bl Reporting 292 9.8 Summary 297 Best Practices Beyond Reporting: Driving Business Value 299 10.1 In This Chapter 299 10.2 Introduction. j 299 10.3 Advanced Analytics: The Foundation for a Beyond-Reporting Approach (Dynamic KPI, Scorecards, Dynamic Dashboarding, and Adaptive Analytics) 300

xiv Contents 10.4 Large Scale Analytics: Business-centric and Technology-centric Requirements and Solution Options "\ 310 10.4.1 Business-centric Requirements 310 10.4.2 Technology-centric Requirements 313 10.5 Accelerating Business Analytics: What to Look for, Look at, and Look Beyond 320 10.6 Delivering Information on Demand and Thereby Performance on Demand 325 10.6.1 Design Pragmatics 326 10.6.2 Demo Pragmatics 328 10.7 Summary 329 Chapter 11 EDW/BI Development Frameworks 331 11.1 In This Chapter 331 11.2 Introduction 332 11.3 From the Big Bang to the Big Data Bang: The Past, Present, and Future 332 11.4 A Framework for Bl Beyond Intelligence 334 11.4.1 Raising the Bar on Bl Using Embeddable Bl and Bl in the Cloud 335 11.4.2 Raising the Bar on Bl: Good to Great to Intelligent 335 11.4.3 Raising the Bar on the Social Intelligence Quotient (SIQ) 338 11.4.4 Raising the Bar on Bl by Mobilizing Bl: Bl on the Go, 341 11.5 A Pragmatic Framework for a Customer-Centric EDW/BI Solution 343 11.6 A Next-Generation Bl Framework 351 11.6.1 Taking EDW/BI to the Next Level: An Open Source Model for EDW/BI-EPM 352 11.6.2 Open Source Model for an Open Source DW-BI/EPM Solution Delivering Business Value f 353 11.6.3 Open Source Architectural Framework for a Best-Fit Open Source BI/EPM Solution 355 11.6.4 Value Proposition 356 11.6.5 The Road Ahead... 357

Contents xv 11.7 A Bl Framework for a Reusable Predictive Analytics Model 357 11.8 A Bl Framework for Competitive Intelligence: Time, Technology, and the Evolution of the Intelligent Customer 358 11.9 Summary 360 Chapter 12 Best Practices for Optimization 363 12.1 In This Chapter 363 12.2 Accelerating Application Testing: Choice, Design, and Suitability 364 12.3 Best Practices for Performance Testing: Online and On Demand Scenarios 366 12.4 A Fine Tuning Framework for Optimality 369 12.5 Looking Down the Customer Experience Trail, Leaving the Customer Alone: Customer Feedback Management (CFM)-Driven and APM-Oriented Tuning. 372 12.6 Codeful and Codeless Design Patterns for Business-Savvy and IT-Friendly QOS Measurements and In-Depth Impact Analysis 373 12.7 Summary 375 Chapter 13 Open Standards for Open Source: An EDW/BI Outlook 377 13.1 Introduction - ' 377 13.2 Summary ' 384 13.3 References 385 Index 387