Next Generation Data Warehousing Appliances 23.10.2014 Presentert av: Espen Jorde, Executive Advisor Bjørn Runar Nes, CTO/Chief Architect
Bjørn Runar Nes Espen Jorde 2 3.12.2014
Agenda Affecto s new Data Warehouse architecture - Pains and gains DW/BI/BA Appliance - Why - What does it do - How does it solve your issues Appliance customer stories
Tear down the Data Warehouse 100 times faster response 50% less operational costs At least 30% shorter projects
Best practice until now System Y System X System Z Data Sources Data ETL Integration Stage Layer Enterprise Layer DM DM DM Ad-hoc Analysis Visual Storytelling Reporting Performance Management
Typical Business Intelligence Challenges Quality and Risk Business work-around Temporary solutions Manual workload Quality issues Performance Poor query performance Long data load window Refresh rate too rare Solution Cost Too complex solutions Non-integrated tools Lack of documentation Outdated architecture and legacy solutions Time to Market Long project delivery time Large backlog Heavy maintenance Technical debt
Affecto s Reference model Data Virtualization Analytical Sandbox Analytical Modeling #2 System Y System X Streaming Real-time MDM Cloud Appliance(s) Stage Data ELT Stage Layer Integration,, ELT Enterprise ELT Layer Layer Big Data Hadoop #3 Integrated Development Environment #3 Cache DM DM VDM #1 Ad-hoc Analysis Visual Storytelling Reporting Performance Management Real-time Analysis
Agenda Affecto s new Data Warehouse architecture - Pains and gains DW/BI/BA Appliance - Why - What does it do - How does it solve your issues Appliance customer stories
What is an appliance? Something: Specialized Built for a purpose Complete solution Easy to use Standardized interface Reasonably prized
Technology Is the Driving Force Shaping the Future
11 Rapid and accelerating pace of change - Those who lag behind will quickly disappear
Why do you need higher performance?
Typical Business Intelligence Challenges Quality and Risk Business work-around Temporary solutions Manual workload Quality issues Performance Poor query performance Long data load window Refresh rate too rare Solution Cost Too complex solutions Non-integrated tools Lack of documentation Outdated architecture and legacy solutions Time to Market Long project delivery time Large backlog Heavy maintenance Technical debt
Traditional Data Warehouse Complexity
Data Warehousing Simplified
Typical Business Intelligence Challenges Quality and Risk Business work-around Temporary solutions Manual workload Quality issues Performance Poor query performance Long data load window Refresh rate too rare Solution Cost Too complex solutions Non-integrated tools Lack of documentation Outdated architecture and legacy solutions Time to Market Long project delivery time Large backlog Heavy maintenance Technical debt
Inside the IBM PureData System for Analytics Optimized Hardware + Software Hardware accelerated AMPP Purpose-built for high performance analytics Requires no tuning Disk Enclosures User data, mirror, swap partitions High speed data streaming Snippet Blades SMP Hosts SQL Compiler Query Plan Optimize Admin Hardware-based query acceleration with FPGAs Blistering fast results Complex analytics executed as the data streams from disk
Typical data load improvements Acceptable throughput using ODBC (ETL) - 2-4x High throughput using Direct Loader (ETL) - 10-75x Extreme throughput using SQL Push-Down (ELT) - 30-200x (approaching 1.5 mill trans/sec on a small appliance) 18
Query performance Mid size tables 10-100x query improvement Queries on large data volumes 100-1000x improvements 19
Sweet spot Loading HUGE tables Playing around with HUGE tables - Adding columns - Changing data ELT Querying on large volumes of detailed data In-database Analytics (R, SPSS, SAS, Phyton, m.fl.) In-database Geospatial 20
PureData Impact Drive Productivity with In-Database Analytics Reduced Effort Before PureData With PureData Simpler No data movement Easy to Govern Accurate - No sampling Lower infrastructure cost Faster In-Db scoring Improved Analyst productivity
Typical Business Intelligence Challenges Quality and Risk Business work-around Temporary solutions Manual workload Quality issues Performance Poor query performance Long data load window Refresh rate too rare Solution Cost Too complex solutions Non-integrated tools Lack of documentation Outdated architecture and legacy solutions Time to Market Long project delivery time Large backlog Heavy maintenance Technical debt
Time to market? - Appliance not the main solution, but - Simplified data modelling - Ease of creating new databases - Ease of duplicating data - Decreased time used on development and testing due to improved performance - Fast load time makes several iterations of POC s more feasible
Agenda Affecto s new Data Warehouse architecture - Pains and gains DW/BI/BA Appliance - Why - What does it do - How does it solve your issues Appliance customer stories
25 3.12.2014 Kilde: Kristian Ramsrud, GOBI 2014
26 3.12.2014 Kilde: Kristian Ramsrud, GOBI 2014
27 3.12.2014 Kilde: Kristian Ramsrud, GOBI 2014
28 3.12.2014 Kilde: Kristian Ramsrud, GOBI 2014
Appliance demands October 2013
Norsk Tipping - Goals A flexible DWH which is easily loaded during the available time period. A scalable solution enabling growth without tuning and refactoring. A DWH providing good response times to end users without using aggregates. Thereby reducing the number of scheduled standard reports and moving towards self-service BI. Data that are easily accessible for the business users and analysts. A DWH where data quality issues can be corrected automatically after the problem has been identified and solved in the source system (easy to implement ETLs that can correct errors). A DWH requiring little effort to operate (DBA, system administration ) At the end of the day: Better decision support Shorter time to market Customer focused development and adaptability.
Norsk Tipping - Requirements Minimal effort to operate. Minimal effort (migration) to get started and see gains, thereby creating room for removal of complexity, refactoring etc. Gradual migration must be possible. NT choose when to switch source/target for the different jobs. Minimal effort to convert today s Oracle relational database to the new format. New environment must support several parallel test and production instances. Backup and restore must be easy. We need good failover solutions. We must be able to access tables from e.g Toad. We want to keep ETL developed in Informatica PowerCenter. Possible to do import/export db objects to/from systems in a standard format. Must support mixed workload, inserts simultaneously as analytical queries run. Must support external workload scheduling. Must cope with parallel execution of jobs. Must be easy to test, both manually and automatically.
Is Converting its Data Warehouse from Oracle to IBM PureData for Analytics Powered by Netezza
What is the main trend evolving? - Consider the many new architectures that boost performance. If your EDW is still on an SMP platform, make migration to MPP a priority. Consider distributing your data warehouse architecture, especially to offload a workload to a standalone platform that performs well with that workload. - When possible, take analytic algorithms to the data, instead of data to the algorithm (as is the DW tradition)
So, what now? Gartner: By 2015, 15% of organizations will modernize their strategies for IM capability and exhibit 20% higher financial performance than their peers. We all will have to change our data warehouse strategies. Are you going to move while you have control, take action now reaping the benefits early? or Wait and see until the circumstances force you to fight your way out of the problems?
3.12.2014 35
Thanks! bjorn.runar.nes@affecto.com espen.jorde@affecto.com www.affecto.com
Tear down the Data Warehouse 100 times faster response 50% less operational costs At least 30% shorter projects