ANALYTICS IN BIG DATA ERA



Similar documents
ANALYTICS IN BIG DATA ERA

Advanced Big Data Analytics with R and Hadoop

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved.

WHAT S NEW IN SAS 9.4

Find the Hidden Signal in Market Data Noise

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

In-Memory Analytics for Big Data

Big Data Integration: A Buyer's Guide

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Advanced In-Database Analytics

Luncheon Webinar Series May 13, 2013

Architecting for the Internet of Things & Big Data

BIG DATA What it is and how to use?

Big Data and Advanced Analytics Technologies for the Smart Grid

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Understanding the Value of In-Memory in the IT Landscape

ANALYTICS CENTER LEARNING PROGRAM

Introducing Oracle Exalytics In-Memory Machine

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

High-Performance Analytics

Achieving Business Value through Big Data Analytics Philip Russom

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Parallel Data Warehouse

Getting Started Practical Input For Your Roadmap

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Testing Big data is one of the biggest

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

I/O Considerations in Big Data Analytics

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

What Does Big Data Mean and Who Will Win? Michael Stonebraker

Oracle Big Data Discovery The Visual Face of Hadoop

Bringing Big Data Modelling into the Hands of Domain Experts

Mark Bennett. Search and the Virtual Machine

RevoScaleR Speed and Scalability

ANALYTICS STRATEGY: creating a roadmap for success

Big Data and Data Science: Behind the Buzz Words

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. DATA MANAGEMENT FOR ANALYTICS

SAP SE - Legal Requirements and Requirements

Big Data and Your Data Warehouse Philip Russom

Oracle Analytics A New Day. Nick Whitehead Senior Director, Oracle Business Analytics, EMEA

HIGH PERFORMANCE ANALYTICS FOR TERADATA

SEIZE THE DATA SEIZE THE DATA. 2015

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Challenges for Data Driven Systems

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

James Serra Sr BI Architect

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Data Centric Systems (DCS)

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Making Sense of the Madness

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Application of Predictive Analytics for Better Alignment of Business and IT

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

SAP Database Strategy Overview. Uwe Grigoleit September 2013

Big Data Processing: Past, Present and Future

Integrating SAP and non-sap data for comprehensive Business Intelligence

Extend your analytic capabilities with SAP Predictive Analysis

Information Architecture

Bringing Big Data to People

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Oracle Big Data SQL Technical Update

Microsoft BI Platform Overview

EMC BACKUP MEETS BIG DATA

SQream Technologies Ltd - Confiden7al

Navigating the Big Data infrastructure layer Helena Schwenk

W o r l d w i d e B u s i n e s s A n a l y t i c s S o f t w a r e F o r e c a s t a n d V e n d o r S h a r e s

9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Bringing the Power of SAS to Hadoop. White Paper

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Safe Harbor Statement

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Performance and Scalability Overview

Microsoft Analytics Platform System. Solution Brief

This Symposium brought to you by

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

Transcription:

ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

AGENDA From DBMS to BIG DATA Architectural Considerations Big Data Analytics Methods Data Discovery: Visual Analytics

WHAT IS BIG DATA? DATA are everywhere: IT organization often collect many data in EDW but them need to integrate with many other sources The ability to generate, communicate, share, and access information has been revolutionized by the increasing number of people, devices, and sensors that are now connected by digital networks. People leave information in networks Devices many ways to provide information Data are a stream continuos of information Data are not only measures but text, images, sounds

ACTUAL COMPANY DATA ORGANIZATION DATA ARE DEPLOYED INFORMATION AS SNAPSHOTS: DATA WAREHOUSE ANALYTICAL DATAMARTS Same information are replicated in several data structures provide slow updating process and slow renewal data. Spreading information need drastic changements into paradigm how companies collect their data and how they use it: Customer data are not only in Customer company DB. These data give partial customers vision: i.e. Telco operators collect customer voice and sms traffic, while many their customers establish contacts using social media and apps. Customers can give many signal on market preferences like a sensor on market but the actual data storage structures and their analytics tools are not be able to deal with these data.

TREND COMPANY DATA ORGANIZATION NEEDS: TO AVOID DATA PROLIFERATION TO PROVIDE SEVERAL SCENARIO OF SAME DATA DATA ENRICHMENT WITH SEVERAL SOURCES QUICKLY DATA RENEWAL TO PROVIDE PATTERN OF CHANGEMENTS SCENARIO Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. The ability to store, aggregate, and combine data and then use the results to perform analysis in motion has become ever more accessible as trends.

NEW QUESTIONS Not always data are in structured data model Often we need to join data with not same keys Often data coming with periodic flow near real time Often we need to recognize pattern from data changing frequently New ways to manage distributed and not structured in classical way data are needed: We need different paradigm to organize data and, above all, to query them. Collect several sources and manage them open several new problems: Relational data (GRAPH DATA) can be useful to understand event spreading in a population. Data in motion coming from several tools on field (sensor devices, smarthphone) provide dynamic pattern often without an history of their form

ANALYSIS Not always you can apply sampling to extract data Not always you can join data to define ABT Often you need to know how environment can influence event: like buy, choice, changement. Often we need to merging information collected with different scope. SQL Queries often are useless to reach these data: Information are not organized into DB structures Data are very different way to provides information: i.e. text are not easy to query using traditional query languages. Merging are driven by fuzzy keys where you can assign group information according statistic relationship. Event can be happen driven from relational with other data rather from specific behavior.

BIG DATA What types?

AGENDA From DBMS to BIG DATA Architectural Considerations Big Data Analytics Methods Data Discovery: Visual Analytics

DBMS and Datamart help to analyzing data coming from one central point data. You need only to know where data is and their meaning. Query are managed directly from DBMS Data are stored in different place and you have to know relationship MAPPING coming from different sources. Here before you extract data your query have to know from which place into the net you have data.

MULTI POINT DATA HUB BUILDING BLOCKS OF A BIG DATA ANALYTICS PROCESS ANALYTICS

REFERENCE ARCHITECTURE EXAMPLE SAS-RACK IMPLEMENTATION CLIENT GREENPLUM TERADATA ORACLE HADOOP

Input Hadoop Output Visual Analytics Metadata High Performance Analytics

Input In memory GRID COMPUTING In Database Output Visual Analytics Metadata Analytical Tool High Performance Analytics

AGENDA From DBMS to BIG DATA Architectural Considerations Big Data Analytics Methods Data Discovery: Visual Analytics

SAS HIGH- PERFORMANCE ANALYTICS Worrying about software performance is not a new concept at SAS What is New? Dedicated high-performance software Accelerated development Why Now?» Customer needs» Blade systems have proven viable platforms for high-performance computing» New computing paradigms» Partnerships with MPP database vendors

SAS PROCEDURES THEN AND NOW proc logistic data=td.mydata; proc hplogistic data=td.mydata; class A B C; class A B C; model y(event= 1 ) = A B B*C; model y(event= 1 ) = A B B*C; run; run; Single-threaded Not aware of distributed computing environment Runs on client Multi-threaded Aware of distributed computing environment Runs on client or DBMS appliance

HP PROCS IN SINGLE SERVER libname disk BASE /filesys ; proc hpreg data=disk.source; analytic stuff run; SAS Process Steps: (1) SAS Process Starts on HW & O/S (2) SAS sets up access library to disk (3) SAS starts HPREG PROC (4) HPREG reads data through ACCESS during computation* (5) Multiple threads are launched to process the incoming data (6) As execution continues, temporary data is written out to utility files on disk *SMP HP PROCS do not load the entire source dataset into RAM the SAS Process utilizes the MEMSIZE option as a boundary. No different than MVA or regular procs, datastep, etc. Process Temp/Utility files to support SAS OPERATING SYSTEM 6 1 SAS Process 5 3 2 Disks /filesys 4 SAS Datasets

HPPROCS IN DISTRIBUTED ARCHITECTURE HADOOP HDAT SHARED-RACK EXAMPLE libname a sashdat; option set=gridhost= NAMENODE ; proc hpreg data=a.source; analytic stuff performance nodes=all; run; SAS Process Steps: (1) SAS Process Starts on HW & O/S (2) SAS sets up access library to disk (3) SAS starts HPREG PROC (4) Due to GRIDHOST and proper access engine setting, multi-threaded processes are started on grid nodes (via TKGrid) (5) As TKGrid processes start up, ALL data is lifted into RAM from HDFS. (6) Processing occurs in parallel against in memory data (7) Results return to initiating process on SAS Server OPERATING SYSTEM Process SAS Process 1 2 3 4 7 HADOOP NAMENODE 4 NODE 1 4 5 Data 6 NODE 2 4 5 Data 6 NODE N 4 5 Data 6

Big data analysis can be done using several analytic strategy. SAS collects many different methods many of them coming from traditional statistical inference analysis using SEMMA paradigm. Other coming from stochastic process analysis both for continue and discrete events. Other coming from linear and not linear mixed models. Graph analysis

AGENDA From DBMS to BIG DATA Architectural Considerations Big Data Analytics Methods Data Discovery: Visual Analytics

ANALYTICAL CATEGORIES AND TARGET USAGE Statistics Data Mining Text Mining Forecasting Econometrics Optimization Binary target & continuous no. predictions Linear, Non- Linear, & Mixed Linear modeling Complex relationships Tree-based Classification Variable Selection Parsing large-scale text collections Extract entities Auto. Stemming & synonym detection Large-scale, multiple hierarchy problems Probability of events Severity of random events Local search optimization Large-scale linear & mixed integer problems Graph theory

Data coming from different sources can be tie using different methods like canonical decomposition. Data pattern variability on data in motion like data coming from devices can be sampled or simulate pattern distribution using Markov chain Monte Carlo methods. Sparse vector data with missing values can be simulate using MCMC or other regression methods Discrete choice among different events can be defined using multinomial discrete models.

GRAPH ANALYSIS Network The Network Analysis objectives are: Identifying the subnets (communities) with high potential of information exchange. Community Measuring changes over time. Producing initiatives which increase the enterprise presence in the single communities knowing the spreading strength of the community.

GRAPH ANALYSIS Node 2 Link A network is collection of the relationships among nodes by links. A node is an individual featured by qualities which can be transmitted through the links (impulses). A link is the relationship which connects 2 nodes. It can be outgoing, incoming or with no direction.

AGENDA From DBMS to BIG DATA Architectural Considerations Big Data Analytics Methods Data Discovery: Visual Analytics

...provide very easy to use - yet sophisticated statistical graphic tools to all of your users? SAS VISUAL ANALYTICS A Single solution for Statistical Visualization and reporting use ad hoc exploration and visualizations to analyze multivariate results? quickly produce mobile dashboards and reports that convey more foresight than hindsight?

SAS VISUAL ANALYTICS BUSINESS VISUALIZATION DRIVEN BY ANALYTICS EXPLORATION AND VISUALIZATION POWER OF ANALYTICS RAPID DELIVERY OF MOBILE INSIGHTS

BUSINESS VISUALIZATION THE DIFFERENCE BETWEEN RAPID INSIGHT AND FAST INFORMATION DATA VISUALIZATION ANALYTIC VISUALIZATION EXPLORATION DISCOVERY

BENEFITS INCREASE THE USE OF ANALYTICS AND BI Self-service Easy to use Analytics Work with more data Reporting and Dashboards Mobile BI Collaboration

SAS VISUAL ANALYTICS MEETING YOUR BUSINESS NEEDS THROUGH FLEXIBILITY Traditional on premise Deployments Public Private Hybrid SAS Cloud & SAS Solutions on Demand