1
Modern Data Warehouse Are you ready for Big Data? Does your DWH / BI roadmap contain all the necessary components? IDG: Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. it represents a unique, perhaps once-in-a-career opportunity to drive growth for their enterprises. They will need to lead the enterprise in the adoption of new information-taming technologies, best practices for leveraging and extracting value from data, and the creation of new roles and organizational design. Each step will require organizational change, not just a few new computers or more software. The success of many enterprises in the coming years will be determined by how successful CIOs are in driving the required enterprisewide adjustment to the new realities of the digital universe. 2
Big Data everyone talks about it The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020. Gartner: By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent. Harward Business Review: Data driven decisions are better decisions its as simple as that. Using big data enables managers to decide on the basis of evidence rather than intuition. For that reason it has the potential to revolutionize management. Larry Feinsmith (Managing Director at JPMorgan Chase): Integrating Hadoop with existing IT investments is vitally important. Picture created 08/2014 using: http://www.google.cz/trends Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it "data warehouse" + "datawarehouse" "business intelligence" "big data" "hadoop" 3
Data Volume [ZB] Data explosion 40,00 35,00 30,00 25,00 20,00 15,00 10,00 5,00 Data Explosion Researchers at IDC estimate: 2010: more than 1 ZB (zettabyte, 1 ZB = 10 21 B) 2015: 6,8 ZB 2020: more than 36 ZB 90% of data on the planet was created in the past 2 years Most of the data is unstructured or semi-structured 90% 10% Structured Unstructured 0,00 Transportation One airplane generates 10 TB of data every 30 minutes. The total amount of data generated daily climbs into the petabyte scale. Utility One gas turbine blade supervision: 588 GB per day (Source: GE) How much data do we create? We call and use smart phones We use banking services, internet and mobile banking, payment cards We shop in stores We shop in e-shops We travel by plane, by car, public transport, etc. We go to the doctor We communicate and have fun 4
Big Data overview Data scientists, Big Data jsou jakákoliv data, která je obtížné efektivně Business analysts zpracovat (uložit, analyzovat) ve stávajících uložištích a nástrojích, Advanced zpravidla & Big Data analytics, Hadoop relačních databázích. Data Lakes Architecture and Data governance, Evangelization 5
Current DWH/BI (reality) Because of rigid DWH/BI platform business users developed their own, not managed, solution(s). They are used to utilize MS Excel and deliver reports as files (XLS, PDF, PPT). Data stored in many storages Many DWHs and Data Marts Excel, Access, etc. used to process and consolidate data Majority of reports & analysis created in MS Excel Data preparation using SQL Analysis performed via contingency tables and graphs Majority of reports & analysis delivered as files Mostly XLS, PDF, PPT format is used Gap between IT and business Users spend 90+% effort manually creating reports Rigid, expensive, slow changes / development Missing or dysfunctional BICC Missing tools / knowledge / trust related to data exploration 6
DWH / BI Study The figures below are based on real DWH/BI study from one client. But such situation is more or less common for many companies 90 80 70 60 50 40 30 20 10 0 Reporting data source Operational Systems DWH Sandbox / SQL 100 80 60 40 20 0 Report creation & delivery Enterprise BI tool MS Excel File Online Not all data available in DWH Data preparation using SQL Sandbox used as a permanent storage Majority of reports & analysis created in MS Excel by different business departments Analysis performed via contingency tables and graphs Missing code / knowledge sharing between misc. users Confusing reports (the same number calculated by different algorithms) Majority of reports & analysis delivered as files Mostly XLS, PDF, PPT format is used Gap between IT and business Users spend 90+% effort manually creating reports Missing or dysfunctional BICC Missing tools / knowledge / trust related to data exploration 7
Current DWH/BI (challenges) Gartner, The State of Data Warehousing in 2012: data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. 4 3 1 5 6 New data sources & types, increasing data volumes More advanced data processing (including unstructured data) Text mining, semantic technologies, NPL Natural language processing, Speech recognition Near real-time data processing BI tools to support flexible data analysis Self-Service BI In-memory analytics, Visual Data Discovery, Data Visualization, Data mining, Predictive analytics, Graph / Network analytics, Path analytics, Sentiment analysis New roles and knowledge BICC Architecture & Data Governance Security Data Quality TCO (affordability) 2 7 8
Comprehensive DWH/BI architecture The traditional DWH ecosystem extended to include new big data sources and technologies. Utilize relevant, affordable platforms to efficiently store, process and analyze data (do more with less). Heterogeneous (but integrated!) big data management platform consist of: Operational Data Store Enterprise Data Warehouse Hadoop Streaming (CEP) Business Intelligence tools should cover Corporate BI, Self-Service BI, Advanced Analytics Efficient BICC is crucial 9
Architecture overview (Microsoft DWH / BI Stack) Big Data Sources (Raw, Unstructured) Streaming Data & Compute Intensive Application, Data Discovery SQL Server StreamInsight Alerts, Notifications Operational Dashboards Business Insights Interactive Reports Performance Scorecards Corporate BI Sensors Devices Web Social Networks Fast Load Hadoop on Windows Azure Hadoop on Windows Server Historical Data (Beyond Active Window) Summarize & Load Analytics Platform System (PDW & HDI) Integrate/ Enrich SQL Server Data Marts SQL Server Analysis Server SQL Server Reporting Services Excel SharePoint Collaboration Self-service Office Integration Data Visualization Data Mining Data Discovery Self-service BI Advanced Analytics Enterprise ETL with SSIS, DQS, MDS BI Governance ERP CRM LOB APPS Source Systems Azure Power BI Administration & Development BICC End-to-End DW & Big Data Platform, Driving Analytics on any Data 10