Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Size: px
Start display at page:

Download "Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis"

Transcription

1 Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2

2 Hadoop s Advantages for Machine Learning and Predictive Analytics Moderator Presenters Mark Rabkin Director Business Development Zementis Ofer Mendelevitch Director of Data Science Hortonworks Michael Zeller CEO Zementis Copyright 2014 Zementis, Inc. All rights reserved. 4

3 The Speakers Ofer Mendelevitch Director of Data Science Hortonworks Michael Zeller CEO & Founder Zementis Ofer Mendelevitch is Director of data sciences at Hortonworks, where he is responsible for professional services involving data science with Hadoop, including use-cases like recommender systems, prediction, classification and search. Prior to joining Hortonworks, Ofer has held a number of positions from Entrepreneur in Residence at XSeed Capital, VP of Engineering at Nor1 and Director of engineering at Yahoo where he led multiple engineering and data science teams. Michael Zeller is the CEO and Co-Founder of Zementis. His vision is to help companies deepen and accelerate insights from big data through the power of predictive analytics. Michael also serves on the Board of Directors of Software San Diego and as Secretary/Treasurer on the Executive Committee of ACM SIGKDD, which is the premier international organization for data mining researchers and practitioners from academia, industry, and government. Copyright 2014 Zementis, Inc. All rights reserved. 5

4 Hortonworks & Zementis Hortonworks: We Do Hadoop. Our mission is to power your Modern Data Architecture by delivering Enterprise Apache Hadoop Zementis provides software for operational deployment of predictive analytics Reseller Partners: Our Commitment: Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills Products & Capabilities: Vendor-neutral architecture for - Data mining tools - Analytics and data warehouse platforms Supports PMML industry standard and wide range of predictive modeling techniques Rapidly deploys and executes predictive models Accelerates business insight Copyright 2014 Zementis, Inc. All rights reserved. 6

5 A data architecture under pressure from new data APPLICATIONS* Business** Analy4cs* Custom* Applica4ons* Packaged* Applica4ons* OLTP,&ERP,&CRM&Systems& Unstructured&documents,& s& 2.8*ZB*in*2012* Server&logs& DATA**SYSTEM* RDBMS* EDW* MPP* REPOSITORIES* 85%*from*New*Data*Types* 15x*Machine*Data*by*2020* Sen>ment,&Web&Data& 40*ZB*by*2020* Source: IDC Sensor.&Machine&Data& SOURCES* Exis4ng*Sources** (CRM,*ERP,*Clickstream,*Logs)* Clickstream& GeoEloca>on& Page 1 Hortonworks Inc All Rights Reserved

6 Hadoop within an emerging Modern Data Architecture APPLICATIONS* DATA**SYSTEM* Business** Analy4cs* RDBMS* EDW* MPP* REPOSITORIES* Custom* Applica4ons* Governance & Integration Data Access Data Management Packaged* Applica4ons* Security Operations DEV*&*DATA*TOOLS* Build & Test OPERATIONS*TOOLS* Provision, Manage & Monitor Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale SOURCES* OLTP,&ERP,& Documents,&& CRM&Systems& s& Web&Logs,& Click&Streams& Social& Networks& Machine& Generated& Sensor& Data& Geoloca>on& Data& Page 2 Hortonworks Inc All Rights Reserved

7 Hadoop unlocks a new approach: Iterative Analytics Current Reality Apply schema on write Dependent on IT Augment w/ Hadoop Apply schema on read Support range of access patterns to data stored in HDFS: polymorphic access SQL* Single&Query&Engine& Repeatable&Linear&Process& Hadoop* Mul>ple&Query&Engines& Itera>ve&Process:&Explore,&Transform,&Analyze& Determine* list*of* ques4ons* Design* solu4ons* Collect* structured* data* Ask* ques4ons* from*list* Detect* addi4onal* ques4ons* Batch* Interac4ve* Real\4me* Streaming* Page 3 Hortonworks Inc All Rights Reserved

8 A (partial) map of machine learning tasks Discovery Clustering Detect natural groupings Outlier detection Detect anomalies Association rule mining Co-occurrence patterns Prediction Classification Predict a category Regression Predict a value Recommendation Predict a preference Page 4 Hortonworks Inc All Rights Reserved

9 Typical iterative flow in machine learning modeling Visualize, Explore Clean Data Hypothesize; Model Acquire Data Measure/ Evaluate Deploy & Monitor Page 5 Hortonworks Inc All Rights Reserved Page 5

10 Why Apache Hadoop for Data Science? Hadoop s schema-on-read reduces cycle time Hadoop is ideal for pre-processing of raw data Structured & unstructured Larger datasets enable better models Large-scale parallel scoring Page 6 Hortonworks Inc All Rights Reserved

11 Hadoop s schema-on-read accelerates innovation I&need&new& data& Schema change project Finally,&we& start& collec>ng& Let&me&see &is& it&any&good?& Start 3 months 6 months 9 months Let&me&see &is& it&any&good?& My&model&is& awesome& Let s&just&put&it&in&a& folder&on&hdfs& Page 7 Hortonworks Inc All Rights Reserved

12 Hadoop is ideal for large scale pre-processing Sample& Transform& Raw&Data& Aggregate& Normalize& Feature& Matrix& Join& OCR& NLP& Page 8 Hortonworks Inc All Rights Reserved

13 Hadoop enables modeling with larger datasets Larger datasets better outcomes More examples More features Banko & Brill, 2001 Page 9 Hortonworks Inc All Rights Reserved

14 Hadoop enables large-scale parallelized scoring Training set Learning Model PMML Native Test set Scoring Output Embarrassingly Parallel Using Hadoop as grid compute infrastructure Page 10 Hortonworks Inc All Rights Reserved

15 What is PMML? Predictive Model Markup Language (PMML) industry standard reduces the complexity of operationalizing models Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models XML-based language used to define statistical and data mining models and to share these between compliant applications Supported by most leading data mining tools, commercial and open-source Data handling and transformations (pre-and post-processing) are a core component of the PMML standard Allows for the clear separation of tasks: Model development vs. model deployment Eliminates the need for custom code and proprietary model deployment solutions Copyright 2014 Zementis, Inc. All rights reserved. 8

16 Predictive Analytics Workflow PMML in action, covering a complete workflow from raw data input to decision output PMML File Raw Inputs Model Signature Input Validation Data Pre- Processing Predictive Model Data Post- Processing Prediction Data and operational types Outliers, Missing Values, Invalid Values Normalize, Discretize, Bin, Map, etc. Derived Model Inputs Model Outputs Scaling, Business Decisions, Thresholds, etc. Copyright 2014 Zementis, Inc. All rights reserved. Confidential 9

17 Path to Business Value Predictive analytics helps organizations unlock the value of their big data Big Data Predictive Analytics Business Insights Decisions & Actions Business Value Applications Databases Cloud Log Files RSS Feeds Other Sources Predictive Models Machine Learning Techniques Data Mining Tools More relevant More accurate More comprehensive More nuanced Faster Lower risk Greater positive impact Accelerated time-tomarket More precise targeting Real-time responsiveness Enhanced operational agility Competitive advantage Higher revenue growth rates Greater profitability Copyright 2014 Zementis, Inc. All rights reserved. 10

18 Traditional Deployment Cycle but model deployment challenges can often erode much of the value that predictive analytics can deliver Develop Operationalize Utilize Business Decisions Data Scientist IT Engineer Business Professional Predictive model deployment becomes a rework cycle Extensive manual coding Cross-checking Fixing coding errors Delayed insight Less accurate decisions Missed opportunities Loss of value Copyright 2014 Zementis, Inc. All rights reserved. 11

19 Deployment with Zementis & PMML Enter Zementis, whose solutions accelerate time-to-insight for predictive analytics Economic Value Time-to-insight Within 2 days * ~ 6 months Accelerated deployment timeline Reduced model deployment cycle time Reduced model deployment expense Increased model throughput Enhanced accuracy Minimal rework, if any Model Deployment Cycle Time Without Zementis With Zementis * And sometimes even within a few hours Rapid insight = Rapid time-to-value from predictive analytics Copyright 2014 Zementis, Inc. All rights reserved. 12

20 Universal PMML Plug-in (UPPI) Data Mining Tools Commercial Vendors (e.g. IBM SPSS, SAS) Open Source Tools (R, KNIME,...) Predictive Algorithms Decision Trees Neural Networks Support Vector Machines Linear and Logistic Regression Naive Bayes Classifiers General and Generalized Linear Models Cox Regression Rule Set Models Clustering Scorecards Association Rules Multiple Models (Segmentation, Chaining, Composition and Ensemble, including Random Forest Models) PMML Model Deployment Integration/Execution Zementis UPPI for Hive/Hadoop Simple Deployment & Execution Upload PMML file(s) in Hive PMML turns into HiveQL functions Seamlessly score data on Hadoop Copyright 2014 Zementis, Inc. All rights reserved. Confidential 13

21 Hive 0.13 Now faster than ever, up to 100x performance improvements and more to come Copyright 2014 Zementis, Inc. All rights reserved. Confidential 14

22 UPPI for Hive 0.13 Performance Scaling by Hadoop Cluster Size 100 Time Nodes 20 Nodes Speeding Up Performance with Tez & ORC Time Hive % Tez 29% Tez & ORC Performance executing a complex PMML model as UDF (User-Defined Function) using Hive % performance improvement when executing the same model and data by enabling Tez & ORC Copyright 2014 Zementis, Inc. All rights reserved. Confidential 15

23 DEMO Zementis Universal PMML Plug-in (UPPI) demo on Hortonworks Sandbox Zementis UPPI for Hive 1. PMML Sample Models > Hive UDFs 2. Run Customer Churn Example Copyright 2014 Zementis, Inc. All rights reserved. 16

24 Broad Applicability Hortonworks and Zementis products accelerate predictive model insights for multiple industries and business use cases Fraud & Risk Scoring Sensor & Device Data Processing Marketing & Sales Financial institutions Scoring bureaus Fraud detection Advanced decision management Rotating equipment Energy Biometrics IP network security Up- /cross-sell and nextbest-offer Marketing campaign optimization Real-time recommendations Copyright 2014 Zementis, Inc. All rights reserved. 17

25 Thank You Questions? Copyright 2014 Zementis, Inc. All rights reserved. 18

Zementis. Universal Deployment of KNIME Models. Big Data and Real-time Scoring. KNIME User Group Meeting, Berlin February

Zementis. Universal Deployment of KNIME Models. Big Data and Real-time Scoring. KNIME User Group Meeting, Berlin February Zementis Universal Deployment of KNIME Models Big Data and Real-time Scoring KNIME User Group Meeting, Berlin February 2015 www.zementis.com Zementis Zementis Zementis provides software for operational

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Predictive Analytics: Seeing the Whole Picture

Predictive Analytics: Seeing the Whole Picture Webinar will begin shortly Predictive Analytics: Seeing the Whole Picture Presented by Caserta Concepts, Zementis, FICO June 18, 2015 Copyright 2015 Zementis, Inc. All rights reserved. 2 Predictive Analytics:

More information

ADAPA Product Data Sheet

ADAPA Product Data Sheet Product Data Sheet Predictive analytics helps organizations unlock the value of their big data, making business insights more relevant, more accurate, more comprehensive and more nuanced. With these enhanced

More information

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop Modern Data Architecture with Enterprise Apache Hadoop Hortonworks. We do Hadoop. Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Our Mission: Enable your Modern Data Architecture

More information

TECHED USER CONFERENCE MAY 3-4, 2016

TECHED USER CONFERENCE MAY 3-4, 2016 TECHED USER CONFERENCE MAY 3-4, 2016 Jan Humble Solutions Architect Software AG Building Prediction Into Your Applications 2016 Software AG. All rights reserved. For internal use only DO YOU TRUST YOUR

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Easy Execution of Data Mining Models through PMML

Easy Execution of Data Mining Models through PMML Easy Execution of Data Mining Models through PMML Zementis, Inc. UseR! 2009 Zementis Development, Deployment, and Execution of Predictive Models Development R allows for reliable data manipulation and

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Universal PMML Plug-in for EMC Greenplum Database

Universal PMML Plug-in for EMC Greenplum Database Universal PMML Plug-in for EMC Greenplum Database Delivering Massively Parallel Predictions Zementis, Inc. info@zementis.com USA: 6125 Cornerstone Court East, Suite #250, San Diego, CA 92121 T +1(619)

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara

HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara DRIVING INNOVATION THROUGH DATA HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara ABOUT ME I am a Data Engineer, not a Data

More information

Hadoop, the Data Lake, and a New World of Analytics

Hadoop, the Data Lake, and a New World of Analytics Hadoop, the Data Lake, and a New World of Analytics Hortonworks. We do Hadoop. Spring 2014 Version 1.0 Page 1 Hortonworks Inc. 2014 Traditional Data Architecture Pressured 2.8 ZB in 2012 85% from New Data

More information

High-Performance Analytics

High-Performance Analytics High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

KNIME Big Data Workshop

KNIME Big Data Workshop KNIME Big Data Workshop Tobias Kötter and Björn Lohrmann KNIME 2016 KNIME.com AG. All Rights Reserved. Variety, Volume, Velocity Variety: integrating heterogeneous data.. and tools Volume: from small files......to

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

KNIME UGM 2014 Partner Session

KNIME UGM 2014 Partner Session KNIME UGM 2014 Partner Session DYMATRIX Stefan Weingaertner DYMATRIX CONSULTING GROUP 1 Agenda 1 Company Introduction 2 DYMATRIX Customer Intelligence Offering 3 PMML2SQL / PMML2SAS Converter 4 Uplift

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Data Science with Hadoop Using Chorus to Operationalize Data Science in the Age of Big Data

Data Science with Hadoop Using Chorus to Operationalize Data Science in the Age of Big Data Data Science with Hadoop Using Chorus to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA.... 2 A COMPLETE DATA SCIENCE ENVIRONMENT.... 4 CONCLUSION.... 7 SYSTEM REQUIREMENTS & SELECTED

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

July 2015. Zementis for IBM z Systems

July 2015. Zementis for IBM z Systems July 2015 Zementis for IBM z Systems Page 1 Zementis for IBM z Systems An integrated predictive analytics deployment and scoring capability for organizations managing data and transactions with IBM z Systems

More information

The R pmmltransformations Package

The R pmmltransformations Package The R pmmltransformations Package Tridivesh Jena Alex Guazzelli Wen-Ching Lin Michael Zeller Zementis, Inc.* Zementis, Inc. Zementis, Inc. Zementis, Inc. Tridivesh.Jena@ Alex.Guazzelli@ Wenching.Lin@ Michael.Zeller@

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Extend your analytic capabilities with SAP Predictive Analysis

Extend your analytic capabilities with SAP Predictive Analysis September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics

More information

The Bloor Group VENDOR PROFILE

The Bloor Group VENDOR PROFILE The Bloor Group SAS and The Hadoop Ecosystem VENDOR PROFILE Our research indicates that, at the current rate of adoption, Hadoop and its ecosystem will become dominant in the area of analytics and BI applications

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

SAP and Hortonworks Reference Architecture

SAP and Hortonworks Reference Architecture SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

Big Data Realities Hadoop in the Enterprise Architecture

Big Data Realities Hadoop in the Enterprise Architecture Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Bringing the Power of SAS to Hadoop. White Paper

Bringing the Power of SAS to Hadoop. White Paper White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What

More information

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Conclusions Paper The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Insights from a presentation at the 2014 Hadoop Summit Featuring Brian Garrett, Principal Solutions Architect

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

Modern Data Architecture for Retail with Apache Hadoop on Windows

Modern Data Architecture for Retail with Apache Hadoop on Windows 1 Modern Data Architecture for Retail with Apache Hadoop on Windows A Hortonworks and Microsoft White Paper JUNE 2014 2 Executive Summary Retailers have a long history of investing in data and analytics

More information

Modern Data Architecture for Predictive Analytics

Modern Data Architecture for Predictive Analytics Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

THE JOURNEY TO A DATA LAKE

THE JOURNEY TO A DATA LAKE THE JOURNEY TO A DATA LAKE 1 THE JOURNEY TO A DATA LAKE 85% OF DATA GROWTH BY 2020 WILL COME FROM NEW TYPES OF DATA ACCORDING TO IDC, AS MUCH AS 85% OF DATA GROWTH BY 2020 WILL COME FROM NEW TYPES OF DATA,

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

Open Source in Financial Services: Meet the challenges of new business models and disruption

Open Source in Financial Services: Meet the challenges of new business models and disruption Open Source in Financial Services: Meet the challenges of new business models and disruption Speakers Vamsi Chemitiganti, General Manager Financial Services, Hortonworks Josh West, Senior Solutions Architect,

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

Modern Data Architecture for Financial Services with Apache Hadoop on Windows

Modern Data Architecture for Financial Services with Apache Hadoop on Windows 1 Modern Data Architecture for Financial Services with Apache Hadoop on Windows A Hortonworks and Microsoft White Paper JUNE 2014 2 Executive Summary Financial services firms have long been dependent on

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation

Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation January 2015 Market Insights Report Executive Summary According to a recent customer survey by Vitria, executives across the consumer,

More information

Predictive Analytics: Too Important to Ignore The six secrets to success with predictive analytics

Predictive Analytics: Too Important to Ignore The six secrets to success with predictive analytics Predictive Analytics: Too Important to Ignore The six secrets to success with predictive analytics Webinar December 18, 2013 Sponsored by: Tony Cosentino VP & Research Director, Business Analytics Ventana

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Talend Big Data. Delivering instant value from all your data. Talend 2014 1

Talend Big Data. Delivering instant value from all your data. Talend 2014 1 Talend Big Data Delivering instant value from all your data Talend 2014 1 I may say that this is the greatest factor: the way in which the expedition is equipped. Roald Amundsen race to the south pole,

More information

Next Generation Data Mining. Data Mining Automation & Realtime-Scoring "on-the-cloud.

Next Generation Data Mining. Data Mining Automation & Realtime-Scoring on-the-cloud. Next Generation Data Mining. Data Mining Automation & Realtime-Scoring "on-the-cloud. Outline DYMATRIX & Zementis Overview Consulting & Product Expertise DynaMine & ADAPA Solution Framework Case Study:

More information

Production ready hadoop. By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv

Production ready hadoop. By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv Production ready hadoop By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv Agenda! Data in today s BFSI world! Modern Data Lake! Use cases & prototyping! Big data impact in BFSI! Thank you!! Defini8on

More information

Tax Fraud in Increasing

Tax Fraud in Increasing Preventing Fraud with Through Analytics Satya Bhamidipati Data Scientist Business Analytics Product Group Copyright 2014 Oracle and/or its affiliates. All rights reserved. 2 Tax Fraud in Increasing 27%

More information

Apache Hadoop Patterns of Use

Apache Hadoop Patterns of Use Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when

More information

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users. Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major

More information

Harnessing Big Data with KNIME

Harnessing Big Data with KNIME Harnessing Big Data with KNIME Tobias Kötter KNIME.com Agenda The three V s of Big Data Big Data Extension and Databases Nodes Demo 2 Variety, Volume, Velocity Variety: integrating heterogeneous data (and

More information

Copyright 2012 EMC Corporation. All rights reserved.

Copyright 2012 EMC Corporation. All rights reserved. 1 Greenplum UAP Enabling Big Data Analytics Brendon Moran Data Scientist 2 Agenda Background On Greenplum And Big Data Analytics Greenplum UAP Greenplum: Not Just Infrastructure Pivotal Labs Customers

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

Data Science in Action

Data Science in Action + Data Science in Action Peerapon Vateekul, Ph.D. Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University + Outlines 2 Data Science & Data Scientist Data Mining Analytics with

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015 Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream

More information

Transformational Insurance Analytics for Sustained Competitive Advantage How a regional carrier leapfrogged into the future of analytics and data

Transformational Insurance Analytics for Sustained Competitive Advantage How a regional carrier leapfrogged into the future of analytics and data Transformational Insurance Analytics for Sustained Competitive Advantage How a regional carrier leapfrogged into the future of analytics and data 0 Today s Speakers Sanjeev Kumar, Saama As Saama s Head

More information

Standards in Predictive Analytics

Standards in Predictive Analytics The role of R, Hadoop and PMML in the mainstreaming of predictive analytics. James Taylor CEO CONTENTS Predictive Analytics Today Broadening The Analytic Ecosystem With R Managing Big Data with Hadoop

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW Presented By: Orion Gebremedhin Director of Technology, Data & Analytics, Neudesic LLC. Data Platform VTSP, Microsoft Corp. @OrionGM BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH HP Vertica Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop Helmut Schmitt Sales Manager DACH Big Data is a Massive Disruptor 2 A 100 fold multiplication in the amount of data is a 10,000

More information

Big Data and Your Data Warehouse Philip Russom

Big Data and Your Data Warehouse Philip Russom Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,

More information

Making Sense of the Madness

Making Sense of the Madness Making Sense of the Madness Deploying Big Data techniques to deal with real world Bigish Data issues Copyright James Mitchell 2014 1 Introduction Warning! Parental Guidance Recommended Please read the

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information