In-Database Analytics Deep Dive with Teradata and Revolution R
|
|
- Harriet Allen
- 8 years ago
- Views:
Transcription
1 In-Database Analytics Deep Dive with Teradata and Revolution R Mario Inchiosa Chief Scientist, Revolution Analytics Tim Miller Partner Integration Lab, Teradata
2 Agenda Introduction Revolution R Enterprise Case Study Global Internet Marketplace Under the Hood Summary & Questions
3 Poll Question #1 Please choose all that apply What data storage/management software do you use? > Hadoop > Teradata > LSF Clusters/Grids > Servers
4 What is R? Most powerful statistical programming language Flexible, extensible and comprehensive for productivity Most widely used data analysis software Used by 2M+ data scientists, statisticians and analysts Create beautiful and unique data visualizations As seen in New York Times, Twitter and Flowing Data Thriving open-source community Leading edge of analytics research Fills the talent gap New graduates prefer R R is Hot bit.ly/r-ishot WHITE PAPER
5 Exploding growth and demand for R R Usage Growth Rexer Data Miner Survey, % of data miners report using R R is the first choice of more data miners than any other software Source: R is the highest paid IT skill > Dice.com, Jan 2014 R most-used data science language after SQL > O Reilly, Jan 2014 R is used by 70% of data miners > Rexer, Sep 2013 R is #15 of all programming languages > RedMonk, Jan 2014 R growing faster than any other data science language > KDnuggets, Aug 2013 More than 2 million users worldwide
6 Debt<10% of Income Yes Good Credit Risks Yes NO Income>$40K Bad Credit Risks NO NO Debt=0% Yes Good Credit Risks Debt<10% of Income Yes Good Credit Risks Yes NO Income>$40K Bad Credit Risks NO NO Debt=0% Yes Good Credit Risks Why Is Teradata Different? Server Based vs. In-Database Architectures Sample Data Desktop and Server Analytic Architecture Analyst Results Results SQL Request In-Database Analytic Architecture Exponential Performance Improvement
7 Challenges Running R in Parallel R is distributed across nodes or servers Runs independently of the other nodes/servers > Great for row independent processing such as Model Scoring > However, for analytic functions requiring all the data such as Model Building Onus is on the R programmer to understand data parallelism Example: Median (Midpoint) Node level calculation: = 4.5 Node Level 1. Find median per node 2. Consolidate and find the midpoint of the results 3. Produce the wrong answer < Wrong System level calculation: = 2.5 < Right System Level 1. Sort all the data 2. Take midpoint 3. Produce the right answer
8 R Operations on Data R operates on independent rows > Score models for a given observation > Parsing Text field > Log(x) R operates on independent partitions > Fit a model to a partition such as region, time, product or store R operates on the entire data set > Global sales average > Regression on all customers R Client R Client R Client
9 Poll Question #2 Please choose all that apply What statistical programming tools do you use? > R/RRE > SAS > SPSS > Statistica > KXEN
10 Revolution Analytics Who is Revolution Analytics?
11 OUR COMPANY OUR SOFTWARE SOME KUDOS The leading provider of advanced analytics software and services based on open source R, since 2007 The only Big Data, Big Analytics software platform based on the data science language R Visionary Gartner Magic Quadrant for Advanced Analytics Platforms, 2014
12 Finance Insurance Manufacturing & High Tech Healthcare & Pharma Digital Economy Analytics Service Providers
13 Revolution R Enterprise is. the only big data big analytics platform based on open source R, the de facto statistical computing language for modern analytics High Performance, Scalable Analytics Portable Across Enterprise Platforms Easier to Build & Deploy Analytics
14 R: Open Source that Drives Innovation, but It Has Some Limitations for Enterprises Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commercial Viability Risk of deployment of open source Commercial license Eliminate risk with open source
15 Introducing Revolution R Enterprise (RRE) The Big Data Big Analytics Platform DevelopR DeployR ConnectR ScaleR DistributedR Big Data Big Analytics Ready > Enterprise readiness > High performance analytics > Multi-platform architecture > Data source integration > Development tools > Deployment tools
16 The Platform Step by Step: R Capabilities R+CRAN Open source R interpreter UPDATED R Freely-available R algorithms Algorithms callable by RevoR Embeddable in R scripts 100% Compatible with existing R scripts, functions and packages RevoR Based on open source R Adds high-performance math Available On: Teradata Database Hortonworks Hadoop Cloudera Hadoop MapR Hadoop IBM Platform LSF Linux Microsoft HPC Clusters Windows & Linux Servers Windows & Linux Workstations
17 The Platform Step by Step: Tools & Deployment DevelopR Integrated development environment for R Visual step-into debugger Based on Visual Studio Isolated Shell Available on: Windows DevelopR DeployR DeployR Web services software development kit for integration analytics via Java, JavaScript or.net APIs Integrates R Into application infrastructures Capabilities: Invokes R Scripts from web services calls RESTful interface for easy integration Works with web & mobile apps, leading BI & Visualization tools and business rules engines
18 DevelopR - Integrated Development Environment Script with type ahead and code snippets Solutions window for organizing code and data Sophisticated debugging with breakpoints, variable values etc. Packages installed and loaded Objects loaded in the R Environment Object details
19 DeployR - Integration with 3rd Party Software Data Analysis R / Statistical Modeling Expert DeployR Deployment Expert Business Intelligence Seamless Bring the power of R to any web enabled application Simple Leverage common APIs including JS, Java,.NET Scalable Robustly scale user and compute workloads Secure Manage enterprise security with LDAP & SSO Mobile Web Apps Cloud / SaaS
20 The Platform Step by Step: Parallelization & Data Sourcing ScaleR Ready-to-Use high-performance big data big analytics Fully-parallelized analytics Data prep & data distillation Descriptive statistics & statistical tests Correlation & covariance matrices Predictive Models linear, logistic, GLM Machine learning Monte Carlo simulation Tools for distributing customized algorithms across nodes ConnectR High-speed & direct connectors Available for: High-performance XDF SAS, SPSS, delimited & fixed format text data files Hadoop HDFS (text & XDF) Teradata Database ODBC DistributedR Distributed computing framework Delivers portability across platforms Available on: Teradata Database Hortonworks / Cloudera / MapR Windows Servers / HPC Clusters IBM Platform LSF Linux Clusters Red Hat Linux Servers SuSE Linux Servers
21 Revolution R Enterprise ScaleR: High Performance Big Data Analytics Data Prep, Distillation & Descriptive Analytics R Data Step Descriptive Statistics Statistical Tests Sampling Data import Delimited, Fixed, SAS, SPSS, ODBC Variable creation & transformation using any R functions and packages Recode variables Factor variables Missing value handling Sort Merge Split Aggregate by category (means, sums) Min / Max Mean Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data Marginal Summaries of Cross Tabulations Chi Square Test Kendall Rank Correlation Fisher s Exact Test Student s t-test Subsample (observations & variables) Random Sampling
22 Revolution R Enterprise ScaleR (continued) Statistical Modeling Machine Learning Predictive Models Covariance/Correlation/Sum of Squares/Cross-product Matrix Multiple Linear Regression Logistic Regression Generalized Linear Models (GLM) - All exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions including: cauchit, identity, log, logit, probit. - User defined distributions & link functions. Classification & Regression Trees and Forests Gradient Boosted Trees Residuals for all models Data Visualization Histogram ROC Curves (actual data and predicted values) Lorenz Curve Line and Scatter Plots Tree Visualization Variable Selection Stepwise Regression Linear Logistic GLM Simulation and HPC Monte Carlo Run open source R functions and packages across cores and nodes Cluster Analysis K-Means Classification & Regression Decision Trees Decision Forests Gradient Boosted Trees Deployment Prediction (scoring) PMML Export
23 Write Once Deploy Anywhere. EDW Teradata Database Hadoop Hortonworks, Cloudera, MapR DeployR ConnectR ScaleR DistributedR Clustered Systems Workstations & Servers In the Cloud IBM Platform LSF Microsoft HPC Windows Linux Amazon AWS DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE
24 Case Study - Global Internet Marketplace Challenge: Model and score 250M customers Server-based workflow was taking 3 days Move calculation in-database to drastically reduce runtime, process twice as many customers, and increase lift
25 Existing Open Source R model Binomial Logistic Regression > 50+ Independent variables including categorical with indicator variables > Train from small sample (many thousands) not a problem in and of itself > Scoring across entire corpus (many hundred millions) slightly more challenging
26 Revolution R Enterprise model Same Binomial Logistic Regression > 50+ Independent variables including categorical with indicator variables > Train from large sample (many millions) more accurately captures user patterns and increases lift > Scoring across entire corpus (many hundred millions) completes in minutes
27 RRE Used to Optimized the Current Process By moving the compute to the data Before After Reduced 3 day process to 10 minutes
28 time Benchmarking the Optimized Process Scaling study: Time vs. Number of Rows Server-based (Not In-DB) In-DB NOTE: Teradata Environment > 4 node, 1700 Appliance RRE Environment > version 7.2, > R rows
29 Optimization process Recode Open Source R to Revolution R Enterprise Before trainit <- glm(as.formula(specs[[i]]), data = training.data, family='binomial', maxit=iters) fits <- predict(trainit, newdata=test.data, type='response') After trainit <- rxglm(as.formula(specs[[i]]), data = training.data, family='binomial', maxiterations=iters) fits <- rxpredict(trainit, newdata=test.data, type='response')
30 Revolution R Enterprise How RRE Scale R Actually Works
31 Revolution R Enterprise: RevoR - Performance Enhanced R Open Source R Revolution R Enterprise Customers report 3-50x performance improvements compared to Open Source R without changing any code Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra 1 Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x General R Benchmarks 2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
32 Scalable and Parallelized Across Cores and Nodes
33 Scalability and Portability of PEMAs Parallel External Memory Algorithms Anatomy of a PEMA: 1) Initialize, 2) Process Chunk, 3) Aggregate, 4) Finalize Process a chunk of data at a time, giving linear scalability Process an unlimited number of rows of data in a fixed amount of RAM Independent of the compute context (number of cores, computers, distributed computing platform), giving portability across these dimensions Independent of where the data is coming from, giving portability with respect to data sources
34 ScaleR Performance Efficient computational algorithms Efficient memory management minimize data copying and data conversion Heavy use of C++ templates; optimal code Efficient data file format; fast access by row and column Models are pre-analyzed to detect and remove duplicate computations and points of failure (singularities) Handle categorical variables efficiently
35 Speed and Scalability Comparison Unique PEMAs: Parallel, externalmemory algorithms High-performance, scalable replacements for R/SAS analytic functions Parallel/distributed processing eliminates CPU bottleneck Data streaming eliminates memory size limitations Works with in-memory and diskbased architectures
36 In-Database Billion Row Logistic Regression 114 seconds on Teradata 2650 (6 nodes, 72 cores), including time to read data Scales linearly with number of rows Scales linearly with number of nodes: 3x faster than on 2 node Teradata system
37 Allstate compares SAS, Hadoop, and R for Big-Data Insurance Models Generalized linear model, 150 million observations, 70 degrees of freedom Approach Platform Time to fit SAS 16-core Sun Server 5 hours rmr/mapreduce 10-node 80-core Hadoop Cluster > 10 hours R 250 GB Server Impossible (> 3 days) Revolution R Enterprise In-Teradata on 6-node minutes
38 Poll Question #3 Please select one answer At what stage are you in your in-database analytics deployment project? > Still researching tools and methods > Evaluating/Selecting data storage/management platform > Evaluating/Selecting analytics programming tools > Launched the project/working on it now > We re done and looking for another one!
39 RRE End-User s Perspective Revolution R Enterprise has a new data source, RxTeradata (ODBC and TPT) # Change the data source if necessary tdconn <- "DRIVER= ; IP= ; DATABASE= ; UID= ; PWD= teradatads <- RxTeradata(table= ", connectionstring=tdconn, ) Revolution R Enterprise has a new compute context, RxInTeradata # Change the compute context tdcompute <- rxinteradata(connectionstring=..., sharedir=..., remotesharedir=..., Sample code for R Logistic Regression revopath=..., wait=.., consoleoutput=...) # Specify model formula and parameters rxlogit(arrdelay>15 ~ Origin + Year + Month + DayOfWeek + UniqueCarrier + F(CRSDepTime), data=teradatads)
40 Table Operators Teradata Table User Defined Functions (UDFs) allow users to place a function in the FROM clause of a SELECT statement Table Operators extend the existing table UDF capability: > Table Operators are Object Oriented Inputs and outputs can be arbitrary and not fixed as Table UDF s require > Table Operators have a simpler row iterator interface Interface simply produces output rows providing a more natural application development interface than Table UDF s > Table operators operate on a stream of rows. Rows are buffered for high-performance, eliminating row at a time processing > Table operators support PARTITON BY and ORDER BY Allows the development of Map Reduce style operators in-database
41 RRE Architecture in Teradata tdconnect <- rxteradata(<data, connection string, >) tdcompute <- rxinteradata(<data, server arguments, >) Request Response Teradata PE Layer Master Process Worker Process Data Partition Message Passing Layer Data Partition AMP Layer Worker Process Worker Process Data Partition Master Process Worker Process Data Partition * All communication is done by binary BLOB s ** PUT-based Installer 1. RRE commands are sent to a Master Process - an External Stored Procedure (XSP) in the Parsing Engine that provides parallel coordination 2. RRE analytics are split into Worker Process tasks that run in a Table Operator (TO) on every AMP. a. HPA analytics iterate over the data, and intermediate results are analyzed and managed by the XSP. b. HPC analytics do not iterate, and final results from each AMP are returned to the XSP 3. Final combined results are assembled by the XSP and returned to the user
42 Summary High-performance, scalable, portable, fully-featured algorithms Integration with R ecosystem Compatibility with Big Data ecosystem
43 Questions? WE LOVE FEEDBACK Resources for you (available on RevolutionAnalytics.com): Questions White Paper: Teradata and Revolution Analytics: For the Big Data Era, An Analytics Revolution Webinar: Big Data Analytics PARTNERS Mobile with App Teradata and Revolution Analytics Rate this Session InfoHub Kiosks teradata-partners.com
44 WE LOVE FEEDBACK Questions Thank You! Rate this Session PARTNERS Mobile App InfoHub Kiosks teradata-partners.com
High Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
More informationDecision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise
Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise Revolution Webinar April 17, 2014 Mario Inchiosa, US Chief Scientist mario.inchiosa@revolutionanalytics.com All
More informationRevolution R Enterprise: Efficient Predictive Analytics for Big Data
Revolution R Enterprise: Efficient Predictive Analytics for Big Data Prepared for The Bloor Group August 2014 Bill Jacobs Director Product Marketing / Field CTO - Big Data Products bill.jacobs@revolutionanalytics.com
More informationRevolution R Enterprise
Revolution R Enterprise Michele Chambers Chief Strategy Officer & VP Product Management @ Revolution Analytics Bill Franks Chief Analytics Officer @ Teradata Agenda Emerging Big Data Analytic Patterns
More informationUsing Microsoft R Server to Address Scalability Issues
Using Microsoft R Server to Address Scalability Issues February 4th, 2016 - Welcome! R What is it? Open Source lingua franca Global Community Ecosystem Can be Scaled to Big Data, Big Analytics Analytics,
More informationR and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs
R and Hadoop: Architectural Options Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs Polling Question #1: Who Are You? (choose one) Statistician or modeler who uses R Other
More informationR Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015
R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians
More informationFind the Hidden Signal in Market Data Noise
Find the Hidden Signal in Market Data Noise Revolution Analytics Webinar, 13 March 2013 Andrie de Vries Business Services Director (Europe) @RevoAndrie andrie@revolutionanalytics.com Agenda Find the Hidden
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationBuilding and Deploying Customer Behavior Models
Building and Deploying Customer Behavior Models February 20, 2014 David Smith, VP Marketing and Community, Revolution Analytics Paul Maiste, President and CEO, Lityx In Today s Webinar About Revolution
More informationLaurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud
Laurence Liew General Manager, APAC Economics Is Driving Big Data Analytics to the Cloud Big Data 101 The Analytics Stack Economics of Big Data Convergence of the 3 forces Big Data Analytics in the Cloud
More informationDelivering Value from Big Data with Revolution R Enterprise and Hadoop
Executive White Paper Delivering Value from Big Data with Revolution R Enterprise and Hadoop Bill Jacobs, Director of Product Marketing Thomas W. Dinsmore, Director of Product Management October 2013 Abstract
More informationSQL Server 2016. Everything built-in. Csom Gergely Microsoft Adat platform szakértő
SQL Server 2016 Everything built-in Csom Gergely Microsoft Adat platform szakértő SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230 80 70 60 50 43 69 49 SQL Server
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationDelivering value from big data with Microsoft R Server and Hadoop
EXECUTIVE WHITE PAPER Delivering value from big data with Microsoft R Server and Hadoop Microsoft Advanced Analytics Team April 2016 ABSTRACT Businesses are continuing to invest in Hadoop to manage analytic
More informationDriving Value from Big Data
Executive White Paper Driving Value from Big Data Bill Jacobs, Director of Product Marketing & Thomas W. Dinsmore, Director of Product Management Abstract Businesses are rapidly investing in Hadoop to
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationHow To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (
White Paper Revolution R Enterprise: Faster Than SAS Benchmarking Results by Thomas W. Dinsmore and Derek McCrae Norton In analytics, speed matters. How much? We asked the director of analytics from a
More informationAPPROACHABLE ANALYTICS MAKING SENSE OF DATA
APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for
More informationTable of Contents. June 2010
June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationTechnical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson
Technical Paper Performance of SAS In-Memory Statistics for Hadoop A Benchmark Study Allison Jennifer Ames Xiangxiang Meng Wayne Thompson Release Information Content Version: 1.0 May 20, 2014 Trademarks
More informationPredictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics
Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationOracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationSome vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
More informationGrow Revenues and Reduce Risk with Powerful Analytics Software
Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationCisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
More informationIntegrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics
Paper 1828-2014 Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics John Cunningham, Teradata Corporation, Danville, CA ABSTRACT SAS High Performance Analytics (HPA) is a
More informationHigh-Performance Analytics
High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends
More informationGreenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum
Greenplum Database Getting Started with Big Data Analytics Ofir Manor Pre Sales Technical Architect, EMC Greenplum 1 Agenda Introduction to Greenplum Greenplum Database Architecture Flexible Database Configuration
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationAn In-Depth Look at In-Memory Predictive Analytics for Developers
September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
More informationHadoop & SAS Data Loader for Hadoop
Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle
More informationHIGH PERFORMANCE ANALYTICS FOR TERADATA
F HIGH PERFORMANCE ANALYTICS FOR TERADATA F F BORN AND BRED IN FINANCIAL SERVICES AND HEALTHCARE. DECADES OF EXPERIENCE IN PARALLEL PROGRAMMING AND ANALYTICS. FOCUSED ON MAKING DATA SCIENCE HIGHLY PERFORMING
More informationThe Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?
Conclusions Paper The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Insights from a presentation at the 2014 Hadoop Summit Featuring Brian Garrett, Principal Solutions Architect
More informationIn-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationIntroducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationEVERYTHING THAT MATTERS IN ADVANCED ANALYTICS
EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS Marcia Kaufman, Principal Analyst, Hurwitz & Associates Dan Kirsch, Senior Analyst, Hurwitz & Associates Steve Stover, Sr. Director, Product Management, Predixion
More informationMake Better Decisions Through Predictive Intelligence
IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly
More informationMike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
More informationData Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationWhite Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics
White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationAssignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationParallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Deep dive into Haven Predictive Analytics Powered by HP Distributed R and
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationAchieve Better Insight and Prediction with Data Mining
Clementine 11.1 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.
More informationSAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform
SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform David Lawler, Oracle Senior Vice President, Product Management and Strategy Paul Kent, SAS Vice President, Big Data What
More informationNetezza and Business Analytics Synergy
Netezza Business Partner Update: November 17, 2011 Netezza and Business Analytics Synergy Shimon Nir, IBM Agenda Business Analytics / Netezza Synergy Overview Netezza overview Enabling the Business with
More information2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE
ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE Big Data Big Data What tax agencies are or will be seeing! Big Data Large and increased data volumes New and emerging
More informationLavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationSAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SAP @cgadalla SESSION CODE: 603
SAP Predictive Analytics: An Overview and Roadmap Charles Gadalla, SAP @cgadalla SESSION CODE: 603 Advanced Analytics SAP Vision Embed Smart Agile Analytics into Decision Processes to Deliver Business
More informationActian SQL in Hadoop Buyer s Guide
Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationEMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data
EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues
More informationCost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
More informationBIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationMark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
More informationIntegrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationCopyright 2012 EMC Corporation. All rights reserved.
1 Greenplum UAP Enabling Big Data Analytics Brendon Moran Data Scientist 2 Agenda Background On Greenplum And Big Data Analytics Greenplum UAP Greenplum: Not Just Infrastructure Pivotal Labs Customers
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationModern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
More informationApache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com
Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationAcademyR Course Catalog
AcademyR Course Catalog Table of Contents Our Philosophy...3 Courses Listed by Role Data Analyst...4 Data Scientist...6 R Programmer...9 Statistician.... 10 BI Developer... 11 System Administrator... 12
More informationHigh Performance Analytics with In-Database Processing
High Performance Analytics with In-Database Processing Stephen Brobst, Chief Technology Officer, Teradata Corporation, San Diego, CA Keith Collins, Senior Vice President & Chief Technology Officer, SAS
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationWHAT S NEW IN SAS 9.4
WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationData Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open
Data Warehouse as a Service Version: 1.1, Issue Date: 05/02/2014 Classification: Open Classification: Open ii MDS Technologies Ltd 2014. Other than for the sole purpose of evaluating this Response, no
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationAutomated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer
Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationSTATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II)
STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II) With the New Basel Capital Accord of 2001 (BASEL II) the banking industry
More information2015 Ironside Group, Inc. 2
2015 Ironside Group, Inc. 2 Introduction to Ironside What is Cloud, Really? Why Cloud for Data Warehousing? Intro to IBM PureData for Analytics (IPDA) IBM PureData for Analytics on Cloud Intro to IBM dashdb
More informationPredictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R
Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R PREDICTIVE ANALYTICS WITH TIBCO SPOTFIRE TIBCO Spotfire is the premier data discovery and analytics platform, which provides
More information