KNIME Big Data Workshop

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "KNIME Big Data Workshop"

Transcription

1 KNIME Big Data Workshop Tobias Kötter and Björn Lohrmann KNIME 2016 KNIME.com AG. All Rights Reserved.

2 Variety, Volume, Velocity Variety: integrating heterogeneous data.. and tools Volume: from small files......to distributed data repositories (Hadoop) bring the computation to the data Velocity: from distributing computationally heavy computations......to real time scoring of millions of records/sec KNIME.com AG. All Rights Reserved. 2 2

3 Variety 2016 KNIME.com AG. All Rights Reserved. 3

4 The KNIME Analytics Platform: Open for Every Data, Tool, and User Data Scientist Business Analyst External Data Connectors KNIME Analytics Platform Native Data Access, Analysis, Visualization, and Reporting External and Legacy Tools Distributed / Cloud Execution 2016 KNIME.com AG. All Rights Reserved. 4

5 Data Integration 2016 KNIME.com AG. All Rights Reserved. 5

6 Integrating R and Python 2016 KNIME.com AG. All Rights Reserved. 6

7 Modular Integrations 2016 KNIME.com AG. All Rights Reserved. 7

8 Other Programming/Scripting Integrations 2016 KNIME.com AG. All Rights Reserved. 8

9 Volume 2016 KNIME.com AG. All Rights Reserved. 9

10 Database Extension Visually assemble complex SQL statements Connect to almost all JDBC-compliant databases Harness the power of your database within KNIME 2016 KNIME.com AG. All Rights Reserved. 10

11 In-Database Processing Operations are performed within the database 2016 KNIME.com AG. All Rights Reserved. 11

12 Tip SQL statements are logged in KNIME log file 2016 KNIME.com AG. All Rights Reserved. 12

13 Database Port Types Database Connection Port (dark red) Connection information SQL statement Database JDBC Connection Port (light red) Connection information Database Connection Ports can be connected to Database JDBC Connection Ports but not vice versa 2016 KNIME.com AG. All Rights Reserved. 13

14 Database Connectors Nodes to connect to specific Databases Bundling necessary JDBC drivers Easy to use DB specific behavior/capability Hive and Impala connector part of the commercial KNIME Big Data Connectors extension General Database Connector Can connect to any JDBC source Register new JDBC driver via preferences page 2016 KNIME.com AG. All Rights Reserved. 14

15 Register JDBC Driver Register new JDBC drivers Increase connection timeout for long running database operations Open KNIME and go to File -> Preferences 2016 KNIME.com AG. All Rights Reserved. 15

16 Query Nodes Filter rows and columns Join tables/queries Extract samples Bin numeric columns Sort your data Write your own query Aggregate your data 2016 KNIME.com AG. All Rights Reserved. 16

17 Database GroupBy Manual Aggregation Returns number of rows per group 2016 KNIME.com AG. All Rights Reserved. 17

18 Database GroupBy Pattern Based Aggregation Tick this option if the search pattern is a regular expression otherwise it is treated as string with wildcards ('*' and '?') 2016 KNIME.com AG. All Rights Reserved. 18

19 Database GroupBy Type Based Aggregation Matches all columns Matches all numeric columns 2016 KNIME.com AG. All Rights Reserved. 19

20 Database GroupBy DB Specific Aggregation Methods SQLite 7 aggregation functions PostgreSQL 25 aggregation functions 2016 KNIME.com AG. All Rights Reserved. 20

21 Database GroupBy Custom Aggregation Function 2016 KNIME.com AG. All Rights Reserved. 21

22 Database Writing Nodes Create table as select Insert/append data Update values in table Delete rows from table 2016 KNIME.com AG. All Rights Reserved. 22

23 Performance Tip Increase batch size in database manipulation nodes Increase batch size for better performance 2016 KNIME.com AG. All Rights Reserved. 23

24 KNIME Big Data Connectors Package required drivers/libraries for HDFS, Hive, Impala access Preconfigured connectors Hive Cloudera Impala Extends the open source database integration 2016 KNIME.com AG. All Rights Reserved. 24

25 Hive/Impala Loader Batch upload a KNIME data table to Hive/Impala 2016 KNIME.com AG. All Rights Reserved. 25

26 HDFS File Handling New nodes HDFS Connection HDFS File Permission Utilize the existing remote file handling nodes Upload/download files Create/list directories Delete files 2016 KNIME.com AG. All Rights Reserved. 26

27 HDFS File Handling 2016 KNIME.com AG. All Rights Reserved. 27

28 KNIME Spark Executor Based on Spark MLlib Scalable machine learning library Runs on Hadoop Algorithms for Classification (decision tree, naïve bayes, ) Regression (logistic regression, linear regression, ) Clustering (k-means) Collaborative filtering (ALS) Dimensionality reduction (SVD, PCA) 2016 KNIME.com AG. All Rights Reserved. 28

29 Familiar Usage Model Usage model and dialogs similar to existing nodes No coding required 2016 KNIME.com AG. All Rights Reserved. 29

30 MLlib Integration MLlib model ports for model transfer Native MLlib model learning and prediction Spark nodes start and manage Spark jobs Including Spark job cancelation Native MLlib model 2016 KNIME.com AG. All Rights Reserved. 30

31 Data Stays Within Your Cluster Spark RDDs as input/output format Data stays within your cluster No unnecessary data movements Several input/output nodes e.g. Hive, hdfs files, 2016 KNIME.com AG. All Rights Reserved. 31

32 Machine Learning Unsupervised Learning Example 2016 KNIME.com AG. All Rights Reserved. 32

33 Machine Learning Supervised Learning Example 2016 KNIME.com AG. All Rights Reserved. 33

34 Mass Learning Fast Event Prediction Convert supported MLlib models to PMML 2016 KNIME.com AG. All Rights Reserved. 34

35 Sophisticated Learning - Mass Prediction Supports KNIME models and pre-processing steps 2016 KNIME.com AG. All Rights Reserved. 35

36 Closing the Loop Apply model on demand Learn model at scale PMML model MLlib model Sophisticated model learning Apply model at scale 2016 KNIME.com AG. All Rights Reserved. 36

37 Mix and Match Combine with existing KNIME nodes such as loops 2016 KNIME.com AG. All Rights Reserved. 37

38 Modularize and Execute Your Own Spark Code 2016 KNIME.com AG. All Rights Reserved. 38

39 Lazy Evaluation in Spark Transformations are lazy Actions trigger evaluation 2016 KNIME.com AG. All Rights Reserved. 39

40 Spark Node Overview 2016 KNIME.com AG. All Rights Reserved. 40

41 Spark Job Server * Hiveserver 2 Impala KNIME Big Data Architecture Scheduled execution and RESTful workflow submission Submit Impala queries via JDBC Hadoop Cluster Build Spark workflows graphically KNIME Server with extensions: KNIME Big Data Connectors KNIME Big Data Executor for Spark Workflow Upload via HTTP(S) Submit Hive queries via JDBC Submit Spark jobs via HTTP(S) KNIME Analytics Platform with extensions: KNIME Big Data Connectors KNIME Big Data Executor for Spark *Software provided by KNIME, based on https://github.com/spark-jobserver/spark-jobserver 2016 KNIME.com AG. All Rights Reserved. 41

42 Velocity 2016 KNIME.com AG. All Rights Reserved. 42

43 Velocity Computationally Heavy Analytics: Distributed Execution of one workflow branch Parallel Execution of workflow branches Parallel Execution of workflows 2016 KNIME.com AG. All Rights Reserved. 43

44 KNIME Cluster Executor: Distributed Data 2016 KNIME.com AG. All Rights Reserved. 44

45 KNIME Cluster Execution: Distributed Analytics 2016 KNIME.com AG. All Rights Reserved. 45

46 KNIME Batch Executors on Hadoop (via Oozie) Hadoop Master Node Hadoop Worker Node YARN Container REST Client Submit workflow parameters via REST Oozie Server Executes KNIME Workflow Designer Designs and uploads workflow YARN HDFS 2016 KNIME.com AG. All Rights Reserved. 46

47 KNIME Executors on Hadoop (via YARN) KNIME Workflow Users REST Client KNIME Workflow Designer Hadoop Worker Node YARN Container Hadoop Worker Node YARN Container Execute workflow Executes Executes KNIME Server YARN 2016 KNIME.com AG. All Rights Reserved. 47

48 Velocity Computationally Heavy Analytics: Distributed Execution of one workflow branch Parallel Execution of workflow branches Parallel Execution of workflows High Demand Scoring/Prediction: High Performance Scoring using generic Workflows High Performance Scoring of Predictive Models 2016 KNIME.com AG. All Rights Reserved. 48

49 High Performance Scoring via Workflows Record (or small batch) based processing Exposed as RESTful web service 2016 KNIME.com AG. All Rights Reserved. 49

50 High Performance Scoring using Models KNIME PMML Scoring via compiled PMML Deployed on KNIME Server Exposed as RESTful web service Partnership with Zementis ADAPA Real Time Scoring UPPI Big Data Scoring Engine 2016 KNIME.com AG. All Rights Reserved. 50

51 Velocity Computationally Heavy Analytics: Distributed Execution of one workflow branch Parallel Execution of workflow branches Parallel Execution of workflows High Demand Scoring/Prediction: High Performance Scoring using generic Workflows High Performance Scoring of Predictive Models Continuous Data Streams Streaming in KNIME Spark Streaming 2016 KNIME.com AG. All Rights Reserved. 51

52 Streaming in KNIME 2016 KNIME.com AG. All Rights Reserved. 52

53 Spark Streaming 2016 KNIME.com AG. All Rights Reserved. 53

54 Big Data, IoT, and the three V Variety: KNIME inherently well-suited: open platform broad data source/type support extensive tool integration Volume: Bring the computation to the data Big Data Extensions cover ETL and model learning Velocity: Distributed Execution of heavy workflows High Performance Scoring of predictive models Streaming execution 2016 KNIME.com AG. All Rights Reserved. 54

55 Demo 2016 KNIME.com AG. All Rights Reserved. 55

56 Virtual Machines Hortonworks: Cloudera: ds/quickstart_vms.html KNIME VM based on Hortonworks sandbox with preinstalled Spark Job Server is coming soon 2016 KNIME.com AG. All Rights Reserved. 56

57 Resources SQL Syntax and Examples (www.w3schools.com) Apache Spark MLlib (http://spark.apache.org/mllib/) The KNIME Website (www.knime.org) Database Documentation (https://tech.knime.org/databasedocumentation) Big Data Extensions (https://www.knime.org/knime-big-dataextensions) Forum (tech.knime.org/forum) LEARNING HUB under RESOURCES (www.knime.org/learning-hub) Blog for news, tips and tricks (www.knime.org/blog) KNIME TV channel on KNIME 2016 KNIME.com AG. All Rights Reserved. 57

58 The KNIME trademark and logo and OPEN FOR INNOVATION trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME is also registered in Germany KNIME.com AG. All Rights Reserved. 58

Harnessing Big Data with KNIME

Harnessing Big Data with KNIME Harnessing Big Data with KNIME Tobias Kötter KNIME.com Agenda The three V s of Big Data Big Data Extension and Databases Nodes Demo 2 Variety, Volume, Velocity Variety: integrating heterogeneous data (and

More information

What s Cooking in KNIME

What s Cooking in KNIME What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

KNIME opens the Doors to Big Data. A Practical example of Integrating any Big Data Platform into KNIME

KNIME opens the Doors to Big Data. A Practical example of Integrating any Big Data Platform into KNIME KNIME opens the Doors to Big Data A Practical example of Integrating any Big Data Platform into KNIME Tobias Koetter Rosaria Silipo Tobias.Koetter@knime.com Rosaria.Silipo@knime.com 1 Table of Contents

More information

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Interactive Visual Data Exploration with Spark in Databricks Cloud. Hossein

Interactive Visual Data Exploration with Spark in Databricks Cloud. Hossein Interactive Visual Data Exploration with Spark in Databricks Cloud Hossein Falaki @mhfalaki About Databricks Founded by creators of Apache Spark Offers Spark as a service in the cloud Dedicated to open

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2

More information

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came

More information

Zementis. Universal Deployment of KNIME Models. Big Data and Real-time Scoring. KNIME User Group Meeting, Berlin February

Zementis. Universal Deployment of KNIME Models. Big Data and Real-time Scoring. KNIME User Group Meeting, Berlin February Zementis Universal Deployment of KNIME Models Big Data and Real-time Scoring KNIME User Group Meeting, Berlin February 2015 www.zementis.com Zementis Zementis Zementis provides software for operational

More information

The Open Analytics Platform

The Open Analytics Platform The Open Analytics Platform Bernd Wiswedel KNIME.com AG Agenda KNIME.com AG The KNIME Platform Recognition Small Sales Pitch KNIME and R the best of two worlds KNIME (Node) Development 2 A Brief History

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

RapidMiner Radoop Documentation

RapidMiner Radoop Documentation RapidMiner Radoop Documentation Release 2.3.0 RapidMiner April 30, 2015 CONTENTS 1 Introduction 1 1.1 Preface.................................................. 1 1.2 Basic Architecture............................................

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Buzzwords Berlin - 2015 Big data analytics / machine

More information

Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment

Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment Rosaria Silipo Marco A. Zimmer Rosaria.Silipo@knime.com

More information

Cloudera Manager Training: Hands-On Exercises

Cloudera Manager Training: Hands-On Exercises 201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working

More information

Big Data Visualization. Apache Spark and Zeppelin

Big Data Visualization. Apache Spark and Zeppelin Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Ensembles and PMML in KNIME

Ensembles and PMML in KNIME Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany First.Last@Uni-Konstanz.De

More information

Data Science in Action

Data Science in Action + Data Science in Action Peerapon Vateekul, Ph.D. Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University + Outlines 2 Data Science & Data Scientist Data Mining Analytics with

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

http://glennengstrand.info/analytics/fp

http://glennengstrand.info/analytics/fp Functional Programming and Big Data by Glenn Engstrand (September 2014) http://glennengstrand.info/analytics/fp What is Functional Programming? It is a style of programming that emphasizes immutable state,

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

PivotalR: A Package for Machine Learning on Big Data

PivotalR: A Package for Machine Learning on Big Data PivotalR: A Package for Machine Learning on Big Data Hai Qian Predictive Analytics Team, Pivotal Inc. madlib@gopivotal.com Copyright 2013 Pivotal. All rights reserved. What Can Small Data Scientists Bring

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we

More information

AtScale Intelligence Platform

AtScale Intelligence Platform AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016 Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Set Up Hortonworks Hadoop with SQL Anywhere

Set Up Hortonworks Hadoop with SQL Anywhere Set Up Hortonworks Hadoop with SQL Anywhere TABLE OF CONTENTS 1 INTRODUCTION... 3 2 INSTALL HADOOP ENVIRONMENT... 3 3 SET UP WINDOWS ENVIRONMENT... 5 3.1 Install Hortonworks ODBC Driver... 5 3.2 ODBC Driver

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

locuz.com Big Data Services

locuz.com Big Data Services locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

Creating Big Data Applications with Spring XD

Creating Big Data Applications with Spring XD Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

THE VALUE OF DATA SCIENCE STANDARDS IN MANUFACTURING ANALYTICS SOUNDAR SRINIVASAN BOSCH DATA MINING SOLUTIONS AND SERVICES

THE VALUE OF DATA SCIENCE STANDARDS IN MANUFACTURING ANALYTICS SOUNDAR SRINIVASAN BOSCH DATA MINING SOLUTIONS AND SERVICES THE VALUE OF DATA SCIENCE STANDARDS IN MANUFACTURING ANALYTICS SOUNDAR SRINIVASAN BOSCH DATA MINING SOLUTIONS AND SERVICES Outline! Bosch s dual role in advanced manufacturing/industry 4.0! The need for

More information

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco. April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Unlocking the True Value of Hadoop with Open Data Science

Unlocking the True Value of Hadoop with Open Data Science Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big

More information

Introduction to MapReduce, Hadoop, & Spark. Jonathan Carroll-Nellenback Center for Integrated Research Computing

Introduction to MapReduce, Hadoop, & Spark. Jonathan Carroll-Nellenback Center for Integrated Research Computing Introduction to MapReduce, Hadoop, & Spark Jonathan Carroll-Nellenback Center for Integrated Research Computing Big Data Outline Analytics Map Reduce Programming Model Hadoop Ecosystem HDFS, Pig, Hive,

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

OpenText Actuate Big Data Analytics 5.2

OpenText Actuate Big Data Analytics 5.2 OpenText Actuate Big Data Analytics 5.2 OpenText Actuate Big Data Analytics 5.2 introduces several improvements that make the product more useful, powerful and flexible for end users. A new data loading

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Spectrum Technology Platform. Version 9.0. Administration Guide

Spectrum Technology Platform. Version 9.0. Administration Guide Spectrum Technology Platform Version 9.0 Administration Guide Contents Chapter 1: Getting Started...7 Starting and Stopping the Server...8 Installing the Client Tools...8 Starting the Client Tools...9

More information

Making Big Data Processing Simple with Spark. Matei Zaharia

Making Big Data Processing Simple with Spark. Matei Zaharia Making Big Data Processing Simple with Spark Matei Zaharia December 17, 2015 What is Apache Spark? Fast and general cluster computing engine that generalizes the MapReduce model Makes it easy and fast

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved. Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

Lofan Abrams Data Services for Big Data Session # 2987

Lofan Abrams Data Services for Big Data Session # 2987 Lofan Abrams Data Services for Big Data Session # 2987 Big Data Are you ready for blast-off? Big Data, for better or worse: 90% of world s data generated over last two years. ScienceDaily, ScienceDaily

More information

Big Data Use Case: Business Analytics

Big Data Use Case: Business Analytics Big Data Use Case: Business Analytics Starting point A telecommunications company wants to allude to the topic of Big Data. The established Big Data working group has access to the data stock of the enterprise

More information

1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2

1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2 SparkR 1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane SparkR 2 Lecture slides and/or video will be made available within one week Live Demonstration

More information

What is Hadoop? The webinar will begin at 3pm

What is Hadoop? The webinar will begin at 3pm What is Hadoop? The webinar will begin at 3pm You now have a menu in the top right corner of your screen. The red button with a white arrow allows you to expand and contract the webinar menu, in which

More information

HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara

HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara DRIVING INNOVATION THROUGH DATA HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara ABOUT ME I am a Data Engineer, not a Data

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Hadoop & SAS Data Loader for Hadoop

Hadoop & SAS Data Loader for Hadoop Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle

More information

Shark Installation Guide Week 3 Report. Ankush Arora

Shark Installation Guide Week 3 Report. Ankush Arora Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Information Builders Mission & Value Proposition

Information Builders Mission & Value Proposition Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns

More information

Journée Thématique Big Data 13/03/2015

Journée Thématique Big Data 13/03/2015 Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets

More information

RUN BETTER. 2013 SAP AG. All rights reserved. 1

RUN BETTER. 2013 SAP AG. All rights reserved. 1 RUN BETTER 2013 SAP AG. All rights reserved. 1 Project SEEED Processing of Encrypted Data in SAP HANA Internal Outsourcing Data to the Cloud What do you think are the problems? 2013 SAP AG. All rights

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache

More information

Bringing the Power of SAS to Hadoop. White Paper

Bringing the Power of SAS to Hadoop. White Paper White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

APPROACHABLE ANALYTICS MAKING SENSE OF DATA APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

An In-Depth Look at In-Memory Predictive Analytics for Developers

An In-Depth Look at In-Memory Predictive Analytics for Developers September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)

More information

QUEST meeting Big Data Analytics

QUEST meeting Big Data Analytics QUEST meeting Big Data Analytics Peter Hughes Business Solutions Consultant SAS Australia/New Zealand Copyright 2015, SAS Institute Inc. All rights reserved. Big Data Analytics WHERE WE ARE NOW 2005 2007

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

From Spark to Ignition:

From Spark to Ignition: From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: 001-855-844-3881 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information