Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012
Sponsor
Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director, Oracle Product Marketing
Today s Agenda Big Data Was a problem; now it s an opportunity The opportunities of Big Data Analytics Definitions and Comparisons Old Big Data versus New Big Data Big Data Analytics, Data Warehouses, Analytic Databases Big Data and Analytics affect Your Data Warehouse Scalability, Workloads, Data Types, Architectures Analytic tool choices, Data integration/quality best practices Use Cases for Your Data Warehouse & Big Data Analytics Recommendations
Big Data: Problem or Opportunity? In your organization is big data considered mostly a problem or mostly an opportunity? Only 30% of survey respondents consider big data a problem. Oddly enough, big data was a serious problem just a few years ago. Storage and CPUs developed greater capacity, speed, intelligence They also fell in price. New database mgt systems (DBMSs) arrived, designed for big data analytics. Data warehouse appliances and analytic DBMSs are relatively affordable. The vast majority (70%) considers big data an opportunity. The recent economic recession forced deep changes in most businesses. Big data analytics reveals change s root cause, so you can stop or leverage it. 70% 30% Problem because it's hard to manage from a technical viewpoint Opportunity because it yields detailed analytics for business advantage Source TDWI. Survey of 325 respondents, June 2011
Big Data Advanced Analytics Definition of Big Data Analytics It s where advanced analytic techniques operate on big data sets It s about two things: big data AND advanced analytics The two have teamed up to leverage big data The combo turns big data into an opportunity Big Data isn t new. Advanced Analytics isn t new. Their successful combination is new Hundreds of terabytes of data just for analytics is new Big Data Analytics
Opportunities for Big Data Analytics Anything involving customers benefits from big data analytics better-targeted social-influencer marketing (61%) customer-base segmentation (41%) recognition of sales/market opportunities (38%) BI, in general, benefits from big data analytics more numerous and accurate business insights (45%) understanding business change (30%) better planning and forecasting (29%) identification of root causes of cost (29%) Specific analytics applications are likely beneficiaries detection of fraud (33%) quantification of risks (30%) market sentiment trending (30%) Source TDWI. Survey of 325 respondents, June 2011
Traditional Big Data versus New Generation Big Data Traditional Big Data Problem Hoarded Tens of Terabytes, sometimes more Mostly structured and relational data Data mostly from traditional enterprise applications: ERP, CRM, etc. Common in large companies: Mainstream today Real-time for Operational BI, etc. New Generation Big Data Opportunity Leveraged Hundreds of Terabytes, soon to be measured in Petabytes Mixture of structured, semi- & unstructured Also from Web logs, clickstreams, e-commerce, sensors, mobile devices Common in Internet companies: Will eventually go mainstream Real-time for Streaming Data
Traditional DWs versus New Generation Analytic DBs Traditional Data Warehouse Excels with Traditional Big Data Killer platform for standard reports, dashboards, performance mgt, OLAP Mostly aggregates and calculated metrics in timeseries and multidimensional models All the best practices of data management apply, including DI, DQ, MDM Well-understood data for reporting and OLAP Single version of the truth for facts that must be tracked over time Well-organized history of corporate performance New Generation Analytic Database Excels with New Generation Big Data Killer platform for discovery oriented advanced analytics Mostly detailed source data Best practices of data mgt are suspended to preserve data nuggets for discovery Less understood data for discovery analytics Excellent source for deducing unknown facts and relationships Recent information, for studying change
The Location of Big Data and Analytic Workloads affect Your DW Architecture and Design Where do you store and manage big data for analytics? Where do you process big data with analytic tools? In the data warehouse proper? In a standalone edge system that integrates with the DW? Both? These are critical design and architectural decisions that adopters of big data and advanced analytics must make. Federated Data Marts Real Time ODS Customer Mart or ODS No-SQL Database Analytic Sand Box Data Staging Area Detailed Source Data OLAP Cubes EDW Multidimensional Data Models Metrics for Performance Mgt Hadoop Distributed File Sys DW Appliance Columnar DBMS Data Mining Cache Map Reduce Star or Snowflake Scheme
Analytic Workloads affect Your DW Architecture Monolithic DW Architecture A single DBMS instance (or grid) hosts multiple database workloads & datasets Requires hefty DW platform to handle multiple, diverse workloads & data types Distributed DW Architecture Users deploy multiple platforms, so each is optimized for a specific workload type If not controlled, data marts and ODSs may proliferate. Complexity is a problem, due to the large number of edge systems. Hybrid DW Architecture A monolith manages all core reporting/olap data & most workloads But a few workloads are deployed on separate systems
Big Data even affects Your DW s Data Staging Area Data staging areas evolved to do more than stage data Now they must evolve again to accommodate big data Originally data staging areas were temporary holding bins In that spirit, some are good for analytic sandboxes Most data staging areas are optimized for detailed source data Can manage detailed source data as found in transient big data Data is regularly processed while managed in the staging area E.g., sort prior to a DW load. SQL temp tables held in staging for later merging Some analytic workloads (especially columnar) may run well in a staging area Data staging must scale to big data s volume, which comes and goes A cloud could be an elastic platform for unpredictable data staging volumes Big Data Data Staging DW
Analytic Tools for Big Data affect Your DW There s a cross-road where you choose an analytic method or multiple methods! 1. Online Analytic Processing (OLAP) 2. Extreme SQL 3. Statistical Analysis 4. Data Mining 5. Other: Natural Language Processing (NLP), Artificial Intelligence (AI) Each analytic method has requirements for data and analytic tool types. Multiple analytic methods can lead to multiple data stores, DBMSs, DW arch. components and multiple analytic tools
Big Data Analytics affects Data Mgt for Your DW Analyze data first Later, improve it for a more polished analysis Analytic discovery depends on data nuggets Both query-based and predictive analytics need: Big data, raw data Data quality for analytic databases Do discovery work before addressing data anomalies and standardization E.g., fraud is often revealed via non-standard or outlier data Data modeling for analytic databases Modeling data can speed up queries and enable multidimensional views But it loses details & limits queries Do only what s required, like flattening and binning Data for post-analysis use in BI Apply best practices of DI, DQ, modeling 01101 00100 10110 10010 10100 10011 01001
Use Cases for Your DW and Big Data Analytics Big Data enables exploratory analytics. Discover new: Customer base segments Customer behaviors and their meaning Forms of churn and their root causes Relationships among customers and products Analyze big data you ve hoarded. Finally understand: Web site visitor behavior Product quality based on robotic data from manufacturing Product movement via RFID in retail Use tools that handle human language for visibility into: Claims process in insurance Medical records in healthcare Call center applications in any industry Big data improves data samples for older analytic apps: Fraud detection Risk management Actuarial calculations Anything involving statistics or data mining Big data can add more granular detail to analytic datasets: Broaden 360-degree views of customers and other entities, from hundreds of attributes to thousands
CONCLUSIONS Plan ahead, because: Big Data affects Your Data Warehouse Scaling up or out to big data volumes Affects choice of computing architectures, platforms, hardware, DBMSs Incorporating new data types and new data sources Semi- and un-structured data. Web, sensor, and social data Operating in real-time and streaming big data Real-time is a biz requirement. Many new sources stream big data. Supporting multiple database workloads in single DW environment Big data analytics introduces new analytic workloads & affects architecture Adjusting best practices in data management These still apply to big data analytics, but in a different order & priority Updating or replacing your DW models or architecture To accommodate all the above
Big Data Analysis: Airline Industry What do they think about me?
Data Sources Customer records Social Media Email Phone Logs Weblogs
Apache Hadoop Distributed file system Map/Reduce programming
Oracle In-Database Analytics 2 miles Statistical Analysis Data Mining Text Graph Spatial Semantic
Oracle R Enterprise Oracle Advanced Analytics In-database Large data sets Same code
Oracle Big Data Platform Big Data Appliance Exadata Exalytics ACQUIRE ORGANIZE ANALYZE DECIDE
Oracle Big Data Appliance Optimized and Complete Integrated with Oracle Exadata Easy to Deploy Single Vendor Support
Available For Replay To Register go to: www.oracle.com/bigdata
Questions? 25
Contacting Speakers If you have further questions or comments: Philip Russom prussom@tdwi.org Peter Jeffcock peter.jeffcock@oracle.com