Big Data and Big Data Modeling

Similar documents
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Information Architecture

Data Modeling for Big Data

CA Technologies optimizes business systems worldwide with enterprise data model

Enterprise MDM Logical Modeling

Data Governance Tips & Advice

SQL Server 2012 Performance White Paper

Data Deduplication: An Essential Component of your Data Protection Strategy

Layered Tech expands to new markets and improves ROI with CA 3Tera AppLogic

The Benefits of Data Modeling in Business Intelligence

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

A to Z Information Services stands out from the competition with CA Recovery Management solutions

CA ControlMinder for Virtual Environments May 2012

Data Governance and CA ERwin Active Model Templates

CA SOLVE:Central Service Desk for z/os

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

ScaleMatrix safeguards 100 terabytes of data and continuity of cloud services with CA Technologies

How To Model Data For Business Intelligence (Bi)

Actian SQL in Hadoop Buyer s Guide

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Big data management with IBM General Parallel File System

CA Oblicore Guarantee for Managed Service Providers

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Radix Technologies China establishes compelling cloud services using CA AppLogic

Sicredi improves data center monitoring with CA Data Center Infrastructure Management

Understanding the Value of In-Memory in the IT Landscape

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage

Parallel Data Warehouse

Focus on the business, not the business of data warehousing!

Luncheon Webinar Series May 13, 2013

CA Workload Automation Agents Operating System, ERP, Database, Application Services and Web Services

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

CA Big Data Management: It s here, but what can it do for your business?

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

CA Workload Automation Agents for Mainframe-Hosted Implementations

CA Scheduler Job Management r11

Service Virtualization CA LISA introduction. Jim Dugger CA LISA Product Marketing Manager Steve Mazzuca CA LISA Public Sector Alliances Director

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Using In-Memory Computing to Simplify Big Data Analytics

Version Overview. Business value

CA Repository for z/os r7.2

CA Telon Application Generator r5.1

CA Clarity PPM. Overview. Benefits. agility made possible

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Deployment Options for Microsoft Hyper-V Server

Dynamic Data Center Update:

CA ERwin Data Modeling's Role in the Application Development Lifecycle

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Architecture in the API Era

Innovative technology for big data analytics

CA NSM System Monitoring Option for OpenVMS r3.2

Big Data on the Open Cloud

Nordea saves 3.5 million with enhanced application portfolio management

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Orchestrate IT Process with an Integrated Workflow Management

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

How To Use Ca Product Vision

agility made possible

The IBM Cognos Platform

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

CA NetSpy Network Performance r12

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Broadcloud improves competitive advantage with efficient, flexible and scalable disaster recovery services

How To Improve Your It Performance

Fast, Low-Overhead Encryption for Apache Hadoop*

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Dell One Identity Manager Scalability and Performance

Safe Harbor Statement

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

The Future of Data Management

Chapter 7. Using Hadoop Cluster and MapReduce

Cloud Computing at Google. Architecture

Using an In-Memory Data Grid for Near Real-Time Data Analysis

5 Pillars of API Management with CA Technologies

ROI Business Use Case. Cross-Enterprise Application Performance Management. Helps Reduce Costs & MTTR, Simplify Management, Improve Service Quality

An Oracle White Paper October Oracle: Big Data for the Enterprise

How Can Central IT Use Cloud Technologies to Revolutionize Remote Store Operation?

SQL Server 2012 Parallel Data Warehouse. Solution Brief

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Architectures for Big Data Analytics A database perspective

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

UPSTREAM for Linux on System z

Traditional BI vs. Business Data Lake A comparison

CA Compliance Manager for z/os

SRCH2 Solution Brief SRCH2 Event Analytics for Complex Event Streams. Real-Time Transaction Processing with Event Analytics from SRCH2

Transcription:

Big Data and Big Data Modeling The Age of Disruption Robin Bloor The Bloor Group March 19, 2015 TP02

Presenter Bio Robin Bloor, Ph.D. Robin Bloor is Chief Analyst at The Bloor Group. He has been an industry analyst and commentator on technology for 25 years, with expertise in software development, database, BI and associated technologies. He is a frequent keynote speaker at industry events and primary author of The Bloor Group s research reports. 2

Big Data and Big Data Modeling The Age of Disruption The Data Curve and the Data Warehouse Disruption, Disruption, Disruption A New Modeling Dynamic 3

The Data Curve

The Visible Big Data Trend Corporate data volumes grow at about 55% per annum exponentially Data has been growing at this rate for, maybe, 40 years There is nothing new about big data. It clings to an established exponential trend (It may be speeding up) 5

Technology Evolution (The Way We Were Bloor Curve) 6

And This Implies Software architectures change: centralized, client/server, 3 tier/web, service-oriented architecture, etc. Applications migrate according to latencies. Dominant applications and software brands can die via the innovator s dilemma. Wholly new applications appear because of lower latencies e.g., virtual machines and complex event processing (CEP). 7

The Invisible Data Trend: Moore s Law Cubed The biggest databases are new databases They grow at the cube of Moore s Law Moore s Law = 10x every 6 years VLDB: 1000x every 6 years 1991/2 megabytes 1997/8 gigabytes 2003/4 terabytes 2009/10 petabytes 2015/16 exabytes 8

The Genesis of Hadoop The old databases were having scaling problems. New databases appeared, but so did Hadoop. The number of data sources was exploding. Hadoop quickly became the staging area for these databases, even though it was immature. 9

The Evolution of Hadoop From Serial batch workloads MapReduce Versatile data storage Key-value access only An island of processing To Multiple concurrent workloads Multiple algorithms Optimized data storage SQL, JSON and even SPARQL access Integrated processing 10

The Data Warehouse: From/To Bloor Group 11

The Staging Workload Bloor Group 12

Disruption, Disruption, Disruption

Disruption in Several Dimensions 1. At the hardware layer 2. In software architecture 3. In the data layer 14

Parallelism: The Imp is Out of the Bottle Multicore chips enabled parallelism It has changed the whole performance equation It enabled Big Data Big Data is really Big Processing 15

Technology Revolutions Tech Revolution Architecture Computer Online PC Internet Mobile Internet of Things (IoT) Batch Centralized Client/server Multi-tier Service orientation Event driven/big data/parallel/distributed 16

Unprecedented Acceleration Moore s Law regularly delivered a speed-up of 10x every 6 years Implication: apps get faster every 6 years or so Parallelism delivers an almost unlimited speed-up, assuming you can build the application with a scalable architecture Implications: see later 17

Hardware Disruption: It s Over for Spinning Disk Solid state drives are now on the Moore s Law curve Disk is not and never was (in respect to seek time) All traditional databases were engineered for spinning disk and not for scale-out This explains the new database management (DBMS) products Bloor Group 18

Hardware: In-Memory Disruption Memory may gradually become the primary store for data (this impacts data flows) Almost all applications are poorly built for this Memory is an accelerator as is CPU cache. This is becoming a factor 19

Hardware: The Memory Cascade On chip speed v RAM L1(32K) = 100x L2(246K) = 30x L3(8-20Mb) = 8.6x RAM v SSD RAM = 300x SSD v Disk SSD = 10x Note: Vector instructions and data compression 20

Hardware: Putting a SoC in IT It s possible that the CPUmemory split will vanish (soon) This requires the emergence of the commodity System on a Chip (SoC) There are already Systems on a Chip that run Linux Grids of Systems on a Chip could replace grids of servers Graphic from Samsung Electronics 21

Data Disruption The Barriers are Down Internal Server log files Network log files Unstructured sources Data streams Web data External Mobile data Social media data Internet of things Web scavenging Data markets External streams 22

Data Flow A Set of Principles The data layer is one logical collection of data, both external and internal The data flows, from ingest through a refining process to a point of application It is best if data doesn t flow much Hadoop means corporate data staging Beyond that a database is required to manage workloads 23

The Corporate Data Flows There need to be two data flows (at minimum) Currently we can distinguish between: Real-time/business time applications Analytical applications We will build specific architectures for this 24

A New Modeling Dynamic

The Staging Workloads Data mapping/modeling Metadata discovery Metadata management Master data management Data lineage and lifecycle Bloor Group 26

The New World #1 The primary driver of the new world is that external data sources have expanded Data is being captured without metadata knowledge or even relationship knowledge Unstructured/semi-structured data is prevalent even normal The provenance of data has become an issue The new dimensions: geography and time 27

The New World #2 The single source of truth idea is dead. MDM will become about ontologies Modeling will not die or even diminish but we will explicitly model for context Data flows will be modeled There will be a metadata warehouse There will be event to entity models We will record data lineage We may need to model data lifecyclesw 28

Big Data and Big Data Modeling The Age of Disruption In Summary The Data Curve and the Data Warehouse Disruption, Disruption, Disruption A New Modeling Dynamic 29

Thank You for Attending! For any further questions, feel free contact me following ERworld. Robin Bloor email: robin.bloor@bloorgroup.com twitter: @robinbloor www.insideanalysis.com Please enjoy the rest of your time at ERworld 2015! 30

Legal Notice Copyright CA 2015. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. No unauthorized use, copying or distribution permitted. THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENT AS IS WITHOUT WARRANTY OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits, lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the possibility of such damages. 31