ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics
Introduction
Introduction What is Big Data? If you can't explain it simply, you don't understand it well enough Albert Einstein
Big Data Term commonly used to describe phenomenon of large and/or complex datasets caused by the exponential growth and availability of data that traditional data processing approaches cannot handle effectively
Big Data Big Vs Volume Velocity Variety Visibility Vulnerability Veracity Variability Value V Complexity Immaturity Scalability Performance Availability Manageability Trustworthiness Cost
6
Organisation Volume Velocity Variety Visibility Vulnerability Veracity Value 7
Big Vs Business Perspective It s just Data nothing has changed... Value Enable smarter decisions, faster actions and optimised results Valuation Evaluation of outcomes and continuous optimisation Measure the impact on key performance indicators Assess the ability to forecast future outcomes Visibility Discover important insights demand for good quality analytics & insight Study patterns, behaviours, trends and relationships The demand for data is highlighting the IT bottleneck the cash data economy
Big Vs Information Management Perspective It s Information and many things have changed! Variety Data Integration New data sources (Social Media, Data streams, Internet of Things) Lack of standards for managing various data types Velocity Data streaming, machine/sensor generated data Real-time insights Vulnerability Information Security Fraud discovery
Big Vs Data Science Perspective Value Understanding what is possible Prioritising/sizing up the opportunities Analytics that do not support decision making doesn t add value Variety Internal document/text/audio sources may be inaccessible Need to prototype/test value add in adding new unstructured data from social media and other external sources Velocity Move to real-time services Fraud detection
Big Data Early Days
BIG Data Large Data
BIG Data Large Data
Big Vs Survey Source: Ernst & Young - Global Forensic Data Analytics Survey 2014
BIG Data Better Information Governance
BIG Information Enterprise capability required to acquire relevant Data from multiple sources, organise it in a repository, convert it into Information by analysing it, utilise it for deriving Knowledge to make smarter business Decisions and Outcomes
Promise of Analytics An Information-Powered Organisation is equipped with timely insights and knowledge to continuously optimise the way they derive answers, decisions and actions conduct their business streamline their operations deliver services and/or products adapt to new realities
Information Management Today Maturity Copyright 2015, Oracle 2012 and/or Oracle its Corporation affiliates. All rights Proprietary reserved. and Confidential
Information Management Today Common Challenges Access to Information Information Quality Efficient Use of Information Information support for Business Evolution
Barriers to Mature Use of Information Survey Source: Deloitte - The Analytics Advantage Survey 2013
Information Management Transition
Modern Information Environment People
Need Analytic Value at Fast Pace Data Uncertainty Not familiar and overwhelming Potential value not obvious Requires significant manipulation Tool Complexity Early Hadoop tools only for experts Existing BI tools not designed for Hadoop Emerging solutions lack broad capabilities 80% effort typically spent on evaluating and preparing data Overly dependent on scarce and highly skilled resources 24
Modern Information Environment Roles Data Engineer (Data Wrangler, Data Janitor) Data Scientist (Data Analyst, Data Miner) Business Intelligence Specialist (Report Designer, Data Specialist) Information Consumer
Data Engineer Role Prepare data for easier access, analytical processing and efficient interpretation Skills ability to quickly understand data transform and enrich data programming data visualisation statistics effective present of results
Data Engineer Role Create insights from available data Skills data pattern discovery Ability to tell the story curiosity attention to detail research skills Question everything strive to make a difference
Modern Information Environment People Everyone on the Journey Change Empowered people Easy Access to ALL Information Engineered Processes Technology support Information-powered
Modern Information Environment Business Processes
Modern Information Environment Business Processes Digitally-enabled Business Process Paper-based process in digital form
Modern Information Environment Business Processes in the Digital Era Engineered Automated Fit-for-purpose Easy to (re)configure Easy to use Easy to integrate
Modern Information Environment Technology
Data Reality Existing sources + Emerging Sources Data Warehouse Data Reservoir Existing Sources Emerging Sources 33
Information Management Logical View Information Warehousing
Big Data Eco-System Data Fast Data Apps Outputs 1 2 3 Streams Events Actions Custom Data Services Information Warehouse Packaged Reservoir Factory Warehouse Business Analytics Smart Things Data Lab Reports People Data Sets Discovery Data Science Visualisation Oracle Confidential Internal Business Analytics Product Group 35
Cloudera Hadoop Distribution HDFS Storage - distributed file system that provides high-throughput access to data MapReduce Process - framework for performing distributed data processing Hive - Data Warehouse system for Hadoop HCatalog - metadata abstraction layer for referencing data HBase - Hadoop DataBase - distributed columnar database Hue user interface for managing and using Hadoop components Spark - parallel data processing framework for developing Big Data applications YARN cluster management reliable distributed processing of very large data sets Oozie workflow - coordination system for managing Hadoop jobs ZooKeeper - centralised service for maintaining configuration information Sqoop efficient transfer of bulk data between Hadoop and structured datastores Impala - analytic database designed to leverage the flexibility and scalability of Hadoop Solr - reliable, scalable and fault tolerant search based on Lucene Pig - platform for analysing large data sets 36
Information Management Capabilities Focus Organisation-wide Information Management Goal Information Capability Design Patterns enabling Information-Powered organisation Addressing Barriers 37
Modern Information Environment Governance
Unified Information Architecture Maturity Phases Silos Standardise Optimise Information on Demand One off tools/solutions Bottom-up Local business unit driven Many versions of truth Independent data marts Enterprise standard tools/solutions Data warehouse and dependent data marts IT & LOB partnership Secure, consistent access to all data Agile, flexible architecture Analyse just in time structured & unstructured data together Advanced analytics & real-time recommendations All stage 3 capabilities delivered as a platform (service/cloud) Access of tools and data among broad subscriber group Maturity & Capability
Data Processing - Exploration Definition Understand Data and Data Value Explore attributes/ metadata Understand data quality (needs/reality) Prioritise Identify custodians Addressing Barriers IM Capabilities Data Staging Data Discovery Enterprise Capabilities Planning Experience Continuous Improvement Data and Information Enablers Tools 40
Data Processing - Collection Definition Acquisition/Ingestion/Extraction of data Quality audit Change Data Capture High-Volume Data Acquisition Stream/Event Data Capture Addressing Barriers IM Capabilities Extract Data Ingest Content Change Data Capture Enterprise Capabilities Resource Management Outcomes Skills Tools Enablers 41
Data Processing - Enrichment Definition Data Organisation Enrichment Processing Transformation Protection Data Lineage Addressing Barriers IM Capabilities Transform Data Integrate Data Enrich Content Enterprise Capabilities Procedures Outcomes Hygiene Factors Tools Intellectual Property Security Enablers 42
Data Processing - Storage Definition Structured Data Content/Documents/Records Non-structured Data Storage Strategy Data availability Addressing Barriers IM Capabilities Enterprise Data Warehouse (Curated Data) Data Reservoir (Exploratory Data) Enterprise Capabilities Resource Management Business Continuity Knowledge Data and Information Intellectual Property 43
Data Processing Data Use/Information Creation Definition Generate Information Assets Derive intelligence Actionable Insights Support Decisions Enable Innovation Support Business Processes Addressing Barriers IM Capabilities Transactional Reporting Ad-Hoc reporting Structured Analysis Sentiment Analysis Advanced Analysis Enterprise Search Enterprise Capabilities Decision Making Performance Mngmt. Change Management Strategy and Plans Resource Management Intellectual Property Policies Innovation Experience Skills Templates Creativity 44
Data Processing Data/Information Sharing and Collaboration Definition Publish Information Assets Action on Intelligence Communicate Decisions Operationalise Innovation Addressing Barriers IM Capabilities Enterprise Information Portal Enterprise Capabilities Outcomes Decision Making Communication Collaboration Change Management Shared Vision Tools Experience Structure Relationships 45
Data Processing - Optimisation Definition Optimise Information Assets Optimise Business Processes Optimise Decisions Addressing Barriers IM Capabilities Unknown Data Analysis Evidence-based Decision-Making Enterprise Capabilities Performance Management Business Continuity Outcomes Shared Vision Data and Information Continuous Improvement 46
Data Processing Information Archiving/Disposal Definition Define and apply disposal authorities Preserve information of interest Addressing Barriers IM Capabilities Enterprise Content Management/Archiving Enterprise Capabilities Procedures Policies Security Resource Management 47
Core Information Governance Capabilities IM Capabilities Change Management Data Quality Continuous Optimisation Data Lineage Master Data Information Security Addressing Barriers Enterprise Capabilities Strategy and Planning Procedures Policies Data and Information Security Resource Management Change Management Collaboration 48
Modern Information Environment Summary
Modern Information Environment People Business Processes Technology Governance
Information Management Capability Modernisation Benefits ACCESS QUALITY EFFICIENCY BUSINESS EVOLUTION Informationpowered users Right Information Informed Decisions Operationalised Innovations - Data available for analysis and ad-value processes - Information transparent, useable and actionable to larger audience - Improved utilisation of existing information assets - Information presented consistently and accurately to all authorised users via appropriate channels - Standardised information platform ensures reuse of business patterns and information consistency - Consolidated information platform can reduce costs of maintenance and increase staff efficiency - Minimise support issues from inadequate delivery of information - Real-time analytic environment to support forward-looking analysis and evidence-based decisionmaking - Support Innovation to continuously improve business processes and/or delivery of services - On-time insights available to larger population of information consumers, enabling better and consistent decision-making
Q&A 52
53