R and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution
|
|
- Poppy Norman
- 7 years ago
- Views:
Transcription
1 R and Hadoop: Architectural Options Bill Jacobs VP Product Marketing & Field CTO, Revolution
2 Polling Question #1: Who Are You? (choose one) Statistician or modeler who uses R Other R developer Hadoop Expert Application builder Data guru Business user Systems vendor or reseller Something else
3 Agenda Challenges Options Considerations How to Choose
4 Boundless Opportunities Marketing: Clickstream & Campaign Analyses Digital Media: Recommendation Engines Retail: Social Sentiment Analysis Insurance: Fraud Waste and Abuse Healthcare Delivery: Outcome Prediction Manufacturing: Quality Optimization P&C Insurance: Risk Analysis Consumer Products: Warranty Optimization Operations: Supply Chain Optimization Econometrics: Market Prediction Marketing: Mix and Price Optimization Life Sciences: Pharmacogenetics Transportation: Asset Utilization
5 Polling Question #2: What Industry Do You Represent? Financial Services Insurance Healthcare, Life Sciences or Pharma Manufacturing Energy Retail Logistics and Transportation Education Government Marketing & Advertising Technology Other
6 In A Perfect World Analytical Capability Security Compute Ease Data Scale Price Users
7 Hadoop Analytics - Many Alternatives R Based Alternatives Legacy tools updated SAS HPA, etc. Big Data Databases Other Languages Scala, Java, Julia, various GUIs Today s Topic: R-Based Alternatives Beside Architectures Inside Architectures Open Source and Commercial
8 Reality: Tradeoffs. Traditional Statistics vs. Machine Learning In-Memory vs. Shared Infrastructure CRAN vs. Parallelization Desktop vs. Remote Explicit vs. Automatic Distribution Real-Time vs. MapReduce Locality vs. Movement Memory Limits
9 No Magic Bullet.
10 Corporate Overview & Quick Facts Revolution R Enterprise is the leading commercial analytics platform based on the open source R statistical computing language Founded Office Locations CEO 2008 (as REvolution Computing) Palo Alto (HQ), Seattle (Engineering) Singapore London David Rich Number of customers 200+ Investors Northbridge Venture Partners Intel Capital Platform Vendor Web site:
11 Revolution Analytics Our Vision: R becomes the defacto standard for enterprise predictive analytics Our Mission: Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
12 Revolution Analytics Builds & Delivers: Software Products: Support & Services Stable Distributions Commercial Support Programs Broad Platform Support Training Programs Professional Services Big Data Analytics in R Application Integration Community Programs Deployment Platforms Academic Support Programs Agile Development Tooling Contributions to Open Source R Future Platform Support Open Source Extensions Sponsorship of R User Groups
13 Revolution Analytics Technical Innovations R Options from Open Source Production Deployment to Enterprise Support Parallelized Analytical Computation In-Database & In-Hadoop Analytics Big Data Scalability Multi-Platform Deployment Legacy Data Format Support Multiple IDE Options PMML Model Export Remote Execution
14 The Revolution R Product Suite Revolution R Open Free and open source R distribution Enhanced and distributed by Revolution Analytics Revolution R Plus Open-source distribution of R, packages, and other components Enhanced, supported and indemnified by Revolution Analytics Revolution R Enterprise Secure, Scalable and Supported Distribution of R With proprietary components created by Revolution Analytics
15 Polling Question #3: State Play: In your company you are Building Our Data Lake Running R + Hadoop Data Today Running R inside Hadoop using Open source Running RRE inside Hadoop Deploying Business Apps. Using Analytics from Hadoop Data Looking at Next Steps e.g. Spark, etc.
16 Revolution Analytics: Eight Alternatives for Integrating R & Hadoop Open Source 1. Open Source R 2. Revolution R Open 3. Open Source Parallelization on Workstations & Servers 4. rhadoop: Open Source Parallelization with rhadoop Commercial 5. Revolution R Enterprise on Servers & Workstations 6. Revolution R Enterprise on Edge Nodes 7. Revolution R Enterprise Inside Hadoop 8. Combined Edge Node & Inside Hadoop
17 1. Open Source R Integrated With Hadoop Traditional Open Source R Beside Architecture: CRAN Algorithms rodb C rhdfs rhbas e rhive Traditional Open Source Memory- Limited Data Moves
18 2. Revolution R Open On Workstations & Servers Replace Open Source R Beside Architecture with Revolution R Open CRAN Algorithms rodb C rhdfs rhbas e rhive As with Open Source R: Still Free. Still Memory Based. Data Still Moves. Improvements: Accelerates Math with Intel MKL Improves R-based packages Limitations No Effect for non-r Code
19 Accelerate R Math with Intel Math Kernel Lib s. Source:
20 3. Write Parallel Algorithms PC, Server or Clusters Write R Code to Explicitly Parallelize Deploy Across Several Systems ForEach & Iterator DoParallel (PC, server) DoMPI (cluster) RRE RxEXEC Example Uses: Bootstrapping Simulation HPC Can Include CRAN Algorithms Carefully rodb C rhdfs rhbas e rhive As with Previous: Still Free. Still Memory Based. Data Still Moves. Intel MKL with RRO Improvements: Parallelized Execution Limitations: Parallelization Difficulty Data Movement Platform Specific
21 4. rhadoop: Custom Parallel Execution for Hadoop Execute R Code & CRAN Algorithms Inside Hadoop Remote Desktop Example Uses: Scoring Transformation Easily Parallelized Algorithms R Code rmapreduce Hadoop Streaming Can Include CRAN Algorithms rhbase rhdfs As With Previous: Still Free. Optional Intel MKL in RRO Improvements: Runs R in MapReduce No Data Movement Limitations: Manual Parallelization Hadoop Specific
22 5. Revolution R Enterprise (RRE) PEMAs inside Hadoop Traditional Beside Architecture with Optimized Algorithms Available for Windows, Linux As With Previous: Includes Intel MKL in RRO Revolution R Enterprise: ScaleR PEMA Algorithms plus All of CRAN (subject to memory limits) rodb C rhdfs rhbas e rhive Advantages Speed: PEMAs Parallelize Across Threads, Cores & Sockets Scale: PEMAs Chunk - no Memory Limits All of CRAN Available Portability Fully Supported Limitations: Data Movement Single Machine
23 Revolution R Enterprise is. the only big data big analytics platform based on open source R High Performance, Scalable Analytics Portable Across Enterprise Platforms Easier to Build & Deploy Analytics
24 ScaleR Refactor Algorithms for Dramatic Performance and Capacity Improvement
25 ScaleR High Performance Algorithms for the Most Common Uses Data Step Data import Delimited, Fixed, SAS, SPSS, OBDC Variable creation & transformation Recode variables Factor variables Missing value handling Sort, Merge, Split Aggregate by category (means, sums) Descriptive Statistics Min / Max, Mean, Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix for set variables) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data (standard tables & long form) Marginal Summaries of Cross Tabulations Statistical Tests Chi Square Test Kendall Rank Correlation Fisher s Exact Test Student s t-test Sampling Subsample (observations & variables) Random Sampling Predictive Models Sum of Squares (cross product matrix for set variables) Multiple Linear Regression Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. Covariance & Correlation Matrices Logistic Regression Classification & Regression Trees Predictions/scoring for models Residuals for all models Revolution Analytics Confidential Under NDA Variable Selection Stepwise Regression Simulation Simulation (e.g. Monte Carlo) Parallel Random Number Generation Cluster Analysis K-Means Classification New in Decision Trees 7.3 Decision Forests Gradient Boosted Decision Trees Combination PEMA-R API rxdatastep rxexec 25
26 What s a PEMA? Parallel External Memory Algorithms Script Calls ScaleR Algorithm Scripts can call CRAN Open Source Algorithms Start & Manage Processing Master Algorithm Process Combine Individual Results ScaleR PEMA Load Block At A Analyze Each Time Block Data Not Limited to Available Memory Unlimited Data Scale Ingests Data One Chunk At A Time. Adjustable Memory Footprint Multi-Thread Execution Performance Highly-Optimized Algorithms Algorithm Math Fully Refactored for Parallelism Delivered as ScaleR Library in Revolution R Enterprise
27 6. Run Revolution R Enterprise on Hadoop Edge Node(s) Fast Single-Server Alternative for Modest Data Scale (opt.) Thin Client or Remote Desktop ScaleR + CRAN Algorithms Edge Node rodb C rhdfs rhbas e rhive Local File System As With Previous: Single Machine Execution PEMA Scale & Speed (Single Machine) Use ScaleR + CRAN Accelerate R with Intel MKL Improvements: Easily Shared via No Data Movement Develop on Desktop Run on Edge Node Limitations: Shorter Trip for Data
28 7. Fast, Transparent Parallel Computation Inside Hadoop YARN/MapReduce Fast Parallelized Analytics on Large Data Sets In Hadoop Desktop & Server Tools and Applications We Web b Services vice s DeployR Remote Execution jobtracker ScaleR Algorithms As With Previous: Speed and Scale of ScaleR PEMA Algorithms Use CRAN Where Appropriate Accelerate R Math with MKL Custom Parallelized Algo s Advantages Parallel Computation No Data Movement ScaleR PEMA Parallelization Can Parallelize CRAN Carefully Portable Coding Limitations: Hadoop Workload Profiles
29 One Client s Experience with RRE on Hadoop Test Cluster - 9 Nodes Task Processing Time Importing and Filtering Datasets from HDFS 14 Million Observations 82 sec. 227 Million Observations 310 sec. Modeling and Estimation 1 Edge Node 2 Admin Nodes 9 Task Nodes 1.2 M Correlations 2771 sec. Simple Linear Regression, 227 M Observations 61 sec. Multiple Linear Regression, Three Variables, 227 M Observations Multiple Linear Regression, Four Variables, 227 M Observations 58 sec. 58 sec. 128GB 24 cores each 128GB 24 cores each 64GB 24 cores each Random Forest, 10 Predictor Variables, 227 M Observations, 10 Trees with Max Depth of 10 Splits 2 hr. 3 min. 29
30 8. Combined Edge Node & In-Hadoop Maximized Flexibility, Performance & Workload Handling Thin Client Development Remote Execution ScaleR Algorithms As With Previous: Speed and Scale of ScaleR PEMA Algorithms Use CRAN Where Appropriate Accelerate R Math with MKL Custom Parallelized Algo s Desktop & Server We Tools and b Applications Ser vice s rstudio DeployR Advantages Flexibility for Blended Workloads Little or No Data Movement Maximize CRAN Capabilities by Sharing Large RAM Edge Nodes
31 Occasionally Conflicting Criteria Infrastructure Criteria: Big Data Platform Vendor Choice Data Ingest Data Security Data Governance Data Science Criteria: Performance Self Service Flexibility Collaboration Sharing Capability
32 Key Questions: Where are the bulk of your skills? SAS? R? Java? Python? SQL? Where do you build models today? Do you have the skills to parallelize algorithms? Can models be built on a big shared server? How will you run models? Do you have the budget to purchase commercial solutions? How will your needs change over time? What is your future architecture plan? How risk averse is your management team regarding new platforms and open source?
33 Key Questions (cont.) What Workloads Do You Anticipate? How May Users? What Workloads? Workload Realities: Many small tasks do not run well in MapReduce Large data movements / duplications are costly What Use Cases Will You Encounter? Traditional statistical exploration, modeling? Behavior Prediction? Outlier Detection? Simulation and HPC? Massively wide data? Real-Time scoring? Internet of Things?
34 Eight Steps to Fast, Scalable R Analytics with Hadoop Open Source Options 1. Open Source R 2. Revolution R Open 3. Open Source Parallelization 4. rhadoop Commercial Options 5. RRE on Servers & Workstations 6. RRE on Edge Nodes 7. RRE Inside Hadoop 8. RRE on Edge Node & Inside Hadoop No Clear Winner: Budget & use case determine optimal path Compelling options in both open source & commercial source RRE ScaleR uniquely provides automatic parallelization Current Hadoop platforms are fast for large scale analytics. Combined in-server & in-hadoop fits majority of cases
35 2015 Challenges & Opportunities Evolving Hadoop Architectures In-Memory Analytics Spark, YARN Containers, Caching Additional Algorithm Parallelization Cluster Management Cloud and Hybrid Cloud Clusters SQL on Hadoop Battle-Royale Addressing the Resource Reality Integration, Deployment Both Drain on Expensive Resources Leverage other skills Design efficient collaboration Analytics for the Rest of Us New Consumption Targets Mobile New Participants in Design Business Users
36
37 Recommended Resources Revolution Analytics Products Whitepaper: Delivering Value from Big Data with Revolution R Enterprise and Hadoop Revolution Analytics on Social Media: on on Twitter
38 Thank you GET.REVO
High Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
More informationUsing Microsoft R Server to Address Scalability Issues
Using Microsoft R Server to Address Scalability Issues February 4th, 2016 - Welcome! R What is it? Open Source lingua franca Global Community Ecosystem Can be Scaled to Big Data, Big Analytics Analytics,
More informationRevolution R Enterprise: Efficient Predictive Analytics for Big Data
Revolution R Enterprise: Efficient Predictive Analytics for Big Data Prepared for The Bloor Group August 2014 Bill Jacobs Director Product Marketing / Field CTO - Big Data Products bill.jacobs@revolutionanalytics.com
More informationR Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015
R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians
More informationRevolution R Enterprise
Revolution R Enterprise Michele Chambers Chief Strategy Officer & VP Product Management @ Revolution Analytics Bill Franks Chief Analytics Officer @ Teradata Agenda Emerging Big Data Analytic Patterns
More informationIn-Database Analytics Deep Dive with Teradata and Revolution R
In-Database Analytics Deep Dive with Teradata and Revolution R Mario Inchiosa Chief Scientist, Revolution Analytics Tim Miller Partner Integration Lab, Teradata Agenda Introduction Revolution R Enterprise
More informationDelivering value from big data with Microsoft R Server and Hadoop
EXECUTIVE WHITE PAPER Delivering value from big data with Microsoft R Server and Hadoop Microsoft Advanced Analytics Team April 2016 ABSTRACT Businesses are continuing to invest in Hadoop to manage analytic
More informationDecision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise
Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise Revolution Webinar April 17, 2014 Mario Inchiosa, US Chief Scientist mario.inchiosa@revolutionanalytics.com All
More informationFind the Hidden Signal in Market Data Noise
Find the Hidden Signal in Market Data Noise Revolution Analytics Webinar, 13 March 2013 Andrie de Vries Business Services Director (Europe) @RevoAndrie andrie@revolutionanalytics.com Agenda Find the Hidden
More informationDriving Value from Big Data
Executive White Paper Driving Value from Big Data Bill Jacobs, Director of Product Marketing & Thomas W. Dinsmore, Director of Product Management Abstract Businesses are rapidly investing in Hadoop to
More informationDelivering Value from Big Data with Revolution R Enterprise and Hadoop
Executive White Paper Delivering Value from Big Data with Revolution R Enterprise and Hadoop Bill Jacobs, Director of Product Marketing Thomas W. Dinsmore, Director of Product Management October 2013 Abstract
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationLaurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud
Laurence Liew General Manager, APAC Economics Is Driving Big Data Analytics to the Cloud Big Data 101 The Analytics Stack Economics of Big Data Convergence of the 3 forces Big Data Analytics in the Cloud
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationModern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationIn-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
More informationHADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara
DRIVING INNOVATION THROUGH DATA HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara ABOUT ME I am a Data Engineer, not a Data
More informationTechnical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson
Technical Paper Performance of SAS In-Memory Statistics for Hadoop A Benchmark Study Allison Jennifer Ames Xiangxiang Meng Wayne Thompson Release Information Content Version: 1.0 May 20, 2014 Trademarks
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationIntegrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
More informationBuilding and Deploying Customer Behavior Models
Building and Deploying Customer Behavior Models February 20, 2014 David Smith, VP Marketing and Community, Revolution Analytics Paul Maiste, President and CEO, Lityx In Today s Webinar About Revolution
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationSpark. Fast, Interactive, Language- Integrated Cluster Computing
Spark Fast, Interactive, Language- Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica UC
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationHIGH PERFORMANCE ANALYTICS FOR TERADATA
F HIGH PERFORMANCE ANALYTICS FOR TERADATA F F BORN AND BRED IN FINANCIAL SERVICES AND HEALTHCARE. DECADES OF EXPERIENCE IN PARALLEL PROGRAMMING AND ANALYTICS. FOCUSED ON MAKING DATA SCIENCE HIGHLY PERFORMING
More informationHow To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (
White Paper Revolution R Enterprise: Faster Than SAS Benchmarking Results by Thomas W. Dinsmore and Derek McCrae Norton In analytics, speed matters. How much? We asked the director of analytics from a
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
More informationApache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com
Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came
More informationDell* In-Memory Appliance for Cloudera* Enterprise
Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationHP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH
HP Vertica Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop Helmut Schmitt Sales Manager DACH Big Data is a Massive Disruptor 2 A 100 fold multiplication in the amount of data is a 10,000
More informationTable of Contents. June 2010
June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationSome vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
More informationEMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data
EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationSQL Server 2016. Everything built-in. Csom Gergely Microsoft Adat platform szakértő
SQL Server 2016 Everything built-in Csom Gergely Microsoft Adat platform szakértő SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230 80 70 60 50 43 69 49 SQL Server
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationMike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
More informationPredictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics
Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationHadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
More informationSAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SAP @cgadalla SESSION CODE: 603
SAP Predictive Analytics: An Overview and Roadmap Charles Gadalla, SAP @cgadalla SESSION CODE: 603 Advanced Analytics SAP Vision Embed Smart Agile Analytics into Decision Processes to Deliver Business
More informationUsing In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationEmpowering the Masses with Analytics
Empowering the Masses with Analytics THE GAP FOR BUSINESS USERS For a discussion of bridging the gap from the perspective of a business user, read Three Ways to Use Data Science. Ask the average business
More informationHigh-Performance Analytics
High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationEVERYTHING THAT MATTERS IN ADVANCED ANALYTICS
EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS Marcia Kaufman, Principal Analyst, Hurwitz & Associates Dan Kirsch, Senior Analyst, Hurwitz & Associates Steve Stover, Sr. Director, Product Management, Predixion
More informationGreenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum
Greenplum Database Getting Started with Big Data Analytics Ofir Manor Pre Sales Technical Architect, EMC Greenplum 1 Agenda Introduction to Greenplum Greenplum Database Architecture Flexible Database Configuration
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationAcademyR Course Catalog
AcademyR Course Catalog Table of Contents Our Philosophy...3 Courses Listed by Role Data Analyst...4 Data Scientist...6 R Programmer...9 Statistician.... 10 BI Developer... 11 System Administrator... 12
More informationMSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationBuilding Data-Driven Internet of Things (IoT) Applications
Building Data-Driven Internet of Things (IoT) Applications A four-step primer IOT DEMANDS NEW APPLICATIONS Automated homes. Connected cars. Smart cities. The Internet of Things (IoT) will forever change
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationOracle Big Data Building A Big Data Management System
Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationApril 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.
April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!
More informationL3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
More informationPARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationBig Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC
Technical Paper (Last Revised On: May 6, 2013) Big Data Analytics Benchmarking SAS, R, and Mahout Allison J. Ames, Ralph Abbey, Wayne Thompson SAS Institute Inc., Cary, NC Accurate and Simple Analysis
More informationApigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps
White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,
More informationDRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING
DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING Supreet Oberoi VP Field Engineering, Concurrent Inc GET TO KNOW CONCURRENT Leader in Application Infrastructure
More informationIntroducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
More informationUsing DeployR to Solve the R Integration Problem
DEPLOYR WHITE PAPER Using DeployR to olve the R Integration Problem By the Revolution Analytics DeployR Team March 2015 Introduction Organizations use analytics to empower decision making, often in real
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationWHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics
WHITE PAPER Harnessing the Power of Advanced How an appliance approach simplifies the use of advanced analytics Introduction The Netezza TwinFin i-class advanced analytics appliance pushes the limits of
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationHadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationMSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
More informationMaking big data simple with Databricks
Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created
More informationSpark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
More informationHow To Write A Trusted Analytics Platform (Tap)
Trusted Analytics Platform (TAP) TAP Technical Brief October 2015 TAP Technical Brief Overview Trusted Analytics Platform (TAP) is open source software, optimized for performance and security, that accelerates
More informationActian Vector in Hadoop
Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single
More informationCisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
More informationConverged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
More informationPentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
More informationSAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics
SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify
More informationBig Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
More informationBASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
More informationHurwitz ValuePoint: Predixion
Predixion VICTORY INDEX CHALLENGER Marcia Kaufman COO and Principal Analyst Daniel Kirsch Principal Analyst The Hurwitz Victory Index Report Predixion is one of 10 advanced analytics vendors included in
More informationBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm Meeting the Real-Time Analytics Opportunity Head-On December 11, 2014 Bill Jacobs VP Product Marketing Revolution Analytics @bill_jacobs Vineet Sharma
More informationBig Data Analytics in R
Big Data Analytics in R Big Opportunity, Big Challenge September, 2011 Revolution Confidential Most advanced statistical analysis software available Half the cost of commercial alternatives The professor
More informationOracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More information