KNIME Big Data Workshop
|
|
|
- Henry Willis
- 9 years ago
- Views:
Transcription
1 KNIME Big Data Workshop Tobias Kötter and Björn Lohrmann KNIME 2016 KNIME.com AG. All Rights Reserved.
2 Variety, Volume, Velocity Variety: integrating heterogeneous data.. and tools Volume: from small files......to distributed data repositories (Hadoop) bring the computation to the data Velocity: from distributing computationally heavy computations......to real time scoring of millions of records/sec KNIME.com AG. All Rights Reserved. 2 2
3 Variety 2016 KNIME.com AG. All Rights Reserved. 3
4 The KNIME Analytics Platform: Open for Every Data, Tool, and User Data Scientist Business Analyst External Data Connectors KNIME Analytics Platform Native Data Access, Analysis, Visualization, and Reporting External and Legacy Tools Distributed / Cloud Execution 2016 KNIME.com AG. All Rights Reserved. 4
5 Data Integration 2016 KNIME.com AG. All Rights Reserved. 5
6 Integrating R and Python 2016 KNIME.com AG. All Rights Reserved. 6
7 Modular Integrations 2016 KNIME.com AG. All Rights Reserved. 7
8 Other Programming/Scripting Integrations 2016 KNIME.com AG. All Rights Reserved. 8
9 Volume 2016 KNIME.com AG. All Rights Reserved. 9
10 Database Extension Visually assemble complex SQL statements Connect to almost all JDBC-compliant databases Harness the power of your database within KNIME 2016 KNIME.com AG. All Rights Reserved. 10
11 In-Database Processing Operations are performed within the database 2016 KNIME.com AG. All Rights Reserved. 11
12 Tip SQL statements are logged in KNIME log file 2016 KNIME.com AG. All Rights Reserved. 12
13 Database Port Types Database Connection Port (dark red) Connection information SQL statement Database JDBC Connection Port (light red) Connection information Database Connection Ports can be connected to Database JDBC Connection Ports but not vice versa 2016 KNIME.com AG. All Rights Reserved. 13
14 Database Connectors Nodes to connect to specific Databases Bundling necessary JDBC drivers Easy to use DB specific behavior/capability Hive and Impala connector part of the commercial KNIME Big Data Connectors extension General Database Connector Can connect to any JDBC source Register new JDBC driver via preferences page 2016 KNIME.com AG. All Rights Reserved. 14
15 Register JDBC Driver Register new JDBC drivers Increase connection timeout for long running database operations Open KNIME and go to File -> Preferences 2016 KNIME.com AG. All Rights Reserved. 15
16 Query Nodes Filter rows and columns Join tables/queries Extract samples Bin numeric columns Sort your data Write your own query Aggregate your data 2016 KNIME.com AG. All Rights Reserved. 16
17 Database GroupBy Manual Aggregation Returns number of rows per group 2016 KNIME.com AG. All Rights Reserved. 17
18 Database GroupBy Pattern Based Aggregation Tick this option if the search pattern is a regular expression otherwise it is treated as string with wildcards ('*' and '?') 2016 KNIME.com AG. All Rights Reserved. 18
19 Database GroupBy Type Based Aggregation Matches all columns Matches all numeric columns 2016 KNIME.com AG. All Rights Reserved. 19
20 Database GroupBy DB Specific Aggregation Methods SQLite 7 aggregation functions PostgreSQL 25 aggregation functions 2016 KNIME.com AG. All Rights Reserved. 20
21 Database GroupBy Custom Aggregation Function 2016 KNIME.com AG. All Rights Reserved. 21
22 Database Writing Nodes Create table as select Insert/append data Update values in table Delete rows from table 2016 KNIME.com AG. All Rights Reserved. 22
23 Performance Tip Increase batch size in database manipulation nodes Increase batch size for better performance 2016 KNIME.com AG. All Rights Reserved. 23
24 KNIME Big Data Connectors Package required drivers/libraries for HDFS, Hive, Impala access Preconfigured connectors Hive Cloudera Impala Extends the open source database integration 2016 KNIME.com AG. All Rights Reserved. 24
25 Hive/Impala Loader Batch upload a KNIME data table to Hive/Impala 2016 KNIME.com AG. All Rights Reserved. 25
26 HDFS File Handling New nodes HDFS Connection HDFS File Permission Utilize the existing remote file handling nodes Upload/download files Create/list directories Delete files 2016 KNIME.com AG. All Rights Reserved. 26
27 HDFS File Handling 2016 KNIME.com AG. All Rights Reserved. 27
28 KNIME Spark Executor Based on Spark MLlib Scalable machine learning library Runs on Hadoop Algorithms for Classification (decision tree, naïve bayes, ) Regression (logistic regression, linear regression, ) Clustering (k-means) Collaborative filtering (ALS) Dimensionality reduction (SVD, PCA) 2016 KNIME.com AG. All Rights Reserved. 28
29 Familiar Usage Model Usage model and dialogs similar to existing nodes No coding required 2016 KNIME.com AG. All Rights Reserved. 29
30 MLlib Integration MLlib model ports for model transfer Native MLlib model learning and prediction Spark nodes start and manage Spark jobs Including Spark job cancelation Native MLlib model 2016 KNIME.com AG. All Rights Reserved. 30
31 Data Stays Within Your Cluster Spark RDDs as input/output format Data stays within your cluster No unnecessary data movements Several input/output nodes e.g. Hive, hdfs files, 2016 KNIME.com AG. All Rights Reserved. 31
32 Machine Learning Unsupervised Learning Example 2016 KNIME.com AG. All Rights Reserved. 32
33 Machine Learning Supervised Learning Example 2016 KNIME.com AG. All Rights Reserved. 33
34 Mass Learning Fast Event Prediction Convert supported MLlib models to PMML 2016 KNIME.com AG. All Rights Reserved. 34
35 Sophisticated Learning - Mass Prediction Supports KNIME models and pre-processing steps 2016 KNIME.com AG. All Rights Reserved. 35
36 Closing the Loop Apply model on demand Learn model at scale PMML model MLlib model Sophisticated model learning Apply model at scale 2016 KNIME.com AG. All Rights Reserved. 36
37 Mix and Match Combine with existing KNIME nodes such as loops 2016 KNIME.com AG. All Rights Reserved. 37
38 Modularize and Execute Your Own Spark Code 2016 KNIME.com AG. All Rights Reserved. 38
39 Lazy Evaluation in Spark Transformations are lazy Actions trigger evaluation 2016 KNIME.com AG. All Rights Reserved. 39
40 Spark Node Overview 2016 KNIME.com AG. All Rights Reserved. 40
41 Spark Job Server * Hiveserver 2 Impala KNIME Big Data Architecture Scheduled execution and RESTful workflow submission Submit Impala queries via JDBC Hadoop Cluster Build Spark workflows graphically KNIME Server with extensions: KNIME Big Data Connectors KNIME Big Data Executor for Spark Workflow Upload via HTTP(S) Submit Hive queries via JDBC Submit Spark jobs via HTTP(S) KNIME Analytics Platform with extensions: KNIME Big Data Connectors KNIME Big Data Executor for Spark *Software provided by KNIME, based on KNIME.com AG. All Rights Reserved. 41
42 Velocity 2016 KNIME.com AG. All Rights Reserved. 42
43 Velocity Computationally Heavy Analytics: Distributed Execution of one workflow branch Parallel Execution of workflow branches Parallel Execution of workflows 2016 KNIME.com AG. All Rights Reserved. 43
44 KNIME Cluster Executor: Distributed Data 2016 KNIME.com AG. All Rights Reserved. 44
45 KNIME Cluster Execution: Distributed Analytics 2016 KNIME.com AG. All Rights Reserved. 45
46 KNIME Batch Executors on Hadoop (via Oozie) Hadoop Master Node Hadoop Worker Node YARN Container REST Client Submit workflow parameters via REST Oozie Server Executes KNIME Workflow Designer Designs and uploads workflow YARN HDFS 2016 KNIME.com AG. All Rights Reserved. 46
47 KNIME Executors on Hadoop (via YARN) KNIME Workflow Users REST Client KNIME Workflow Designer Hadoop Worker Node YARN Container Hadoop Worker Node YARN Container Execute workflow Executes Executes KNIME Server YARN 2016 KNIME.com AG. All Rights Reserved. 47
48 Velocity Computationally Heavy Analytics: Distributed Execution of one workflow branch Parallel Execution of workflow branches Parallel Execution of workflows High Demand Scoring/Prediction: High Performance Scoring using generic Workflows High Performance Scoring of Predictive Models 2016 KNIME.com AG. All Rights Reserved. 48
49 High Performance Scoring via Workflows Record (or small batch) based processing Exposed as RESTful web service 2016 KNIME.com AG. All Rights Reserved. 49
50 High Performance Scoring using Models KNIME PMML Scoring via compiled PMML Deployed on KNIME Server Exposed as RESTful web service Partnership with Zementis ADAPA Real Time Scoring UPPI Big Data Scoring Engine 2016 KNIME.com AG. All Rights Reserved. 50
51 Velocity Computationally Heavy Analytics: Distributed Execution of one workflow branch Parallel Execution of workflow branches Parallel Execution of workflows High Demand Scoring/Prediction: High Performance Scoring using generic Workflows High Performance Scoring of Predictive Models Continuous Data Streams Streaming in KNIME Spark Streaming 2016 KNIME.com AG. All Rights Reserved. 51
52 Streaming in KNIME 2016 KNIME.com AG. All Rights Reserved. 52
53 Spark Streaming 2016 KNIME.com AG. All Rights Reserved. 53
54 Big Data, IoT, and the three V Variety: KNIME inherently well-suited: open platform broad data source/type support extensive tool integration Volume: Bring the computation to the data Big Data Extensions cover ETL and model learning Velocity: Distributed Execution of heavy workflows High Performance Scoring of predictive models Streaming execution 2016 KNIME.com AG. All Rights Reserved. 54
55 Demo 2016 KNIME.com AG. All Rights Reserved. 55
56 Virtual Machines Hortonworks: Cloudera: ds/quickstart_vms.html KNIME VM based on Hortonworks sandbox with preinstalled Spark Job Server is coming soon 2016 KNIME.com AG. All Rights Reserved. 56
57 Resources SQL Syntax and Examples ( Apache Spark MLlib ( The KNIME Website ( Database Documentation ( Big Data Extensions ( Forum (tech.knime.org/forum) LEARNING HUB under RESOURCES ( Blog for news, tips and tricks ( KNIME TV channel on KNIME 2016 KNIME.com AG. All Rights Reserved. 57
58 The KNIME trademark and logo and OPEN FOR INNOVATION trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME is also registered in Germany KNIME.com AG. All Rights Reserved. 58
Harnessing Big Data with KNIME
Harnessing Big Data with KNIME Tobias Kötter KNIME.com Agenda The three V s of Big Data Big Data Extension and Databases Nodes Demo 2 Variety, Volume, Velocity Variety: integrating heterogeneous data (and
What s Cooking in KNIME
What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB
KNIME opens the Doors to Big Data. A Practical example of Integrating any Big Data Platform into KNIME
KNIME opens the Doors to Big Data A Practical example of Integrating any Big Data Platform into KNIME Tobias Koetter Rosaria Silipo [email protected] [email protected] 1 Table of Contents
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC [email protected] http://elephantscale.com
Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC [email protected] http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
Ensembles and PMML in KNIME
Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany [email protected]
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta [email protected] Radu Chilom [email protected] Buzzwords Berlin - 2015 Big data analytics / machine
What's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
RapidMiner Radoop Documentation
RapidMiner Radoop Documentation Release 2.3.0 RapidMiner April 30, 2015 CONTENTS 1 Introduction 1 1.1 Preface.................................................. 1 1.2 Basic Architecture............................................
Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment
Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment Rosaria Silipo Marco A. Zimmer [email protected]
Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.
Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus
Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics
Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Lofan Abrams Data Services for Big Data Session # 2987
Lofan Abrams Data Services for Big Data Session # 2987 Big Data Are you ready for blast-off? Big Data, for better or worse: 90% of world s data generated over last two years. ScienceDaily, ScienceDaily
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
PivotalR: A Package for Machine Learning on Big Data
PivotalR: A Package for Machine Learning on Big Data Hai Qian Predictive Analytics Team, Pivotal Inc. [email protected] Copyright 2013 Pivotal. All rights reserved. What Can Small Data Scientists Bring
MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
Unlocking the True Value of Hadoop with Open Data Science
Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
AtScale Intelligence Platform
AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE
Unified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
Best Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
QUEST meeting Big Data Analytics
QUEST meeting Big Data Analytics Peter Hughes Business Solutions Consultant SAS Australia/New Zealand Copyright 2015, SAS Institute Inc. All rights reserved. Big Data Analytics WHERE WE ARE NOW 2005 2007
Cloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
OpenText Actuate Big Data Analytics 5.2
OpenText Actuate Big Data Analytics 5.2 OpenText Actuate Big Data Analytics 5.2 introduces several improvements that make the product more useful, powerful and flexible for end users. A new data loading
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at [email protected].
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics [email protected] www.twitter.com/charliedatamine
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
http://glennengstrand.info/analytics/fp
Functional Programming and Big Data by Glenn Engstrand (September 2014) http://glennengstrand.info/analytics/fp What is Functional Programming? It is a style of programming that emphasizes immutable state,
Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer
Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we
Set Up Hortonworks Hadoop with SQL Anywhere
Set Up Hortonworks Hadoop with SQL Anywhere TABLE OF CONTENTS 1 INTRODUCTION... 3 2 INSTALL HADOOP ENVIRONMENT... 3 3 SET UP WINDOWS ENVIRONMENT... 5 3.1 Install Hortonworks ODBC Driver... 5 3.2 ODBC Driver
Hadoop & SAS Data Loader for Hadoop
Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle
An In-Depth Look at In-Memory Predictive Analytics for Developers
September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)
Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.
Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden
Creating Big Data Applications with Spring XD
Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2
SparkR 1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane SparkR 2 Lecture slides and/or video will be made available within one week Live Demonstration
locuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
Integrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Bringing the Power of SAS to Hadoop. White Paper
White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
Journée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
Shark Installation Guide Week 3 Report. Ankush Arora
Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................
Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes
Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter
#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
Spectrum Technology Platform. Version 9.0. Administration Guide
Spectrum Technology Platform Version 9.0 Administration Guide Contents Chapter 1: Getting Started...7 Starting and Stopping the Server...8 Installing the Client Tools...8 Starting the Client Tools...9
Cloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
Actian SQL in Hadoop Buyer s Guide
Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop
April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.
April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!
and Hadoop Technology
SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
Information Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects
Operationalise Predictive Analytics
Operationalise Predictive Analytics Publish SPSS, Excel and R reports online Predict online using SPSS and R models Access models and reports via Android app Organise people and content into projects Monitor
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
Data Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
APPROACHABLE ANALYTICS MAKING SENSE OF DATA
APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for
Architectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet [email protected] October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
Actian Analytics Platform Express Hadoop SQL Edition 2.0
Actian Analytics Platform Express Hadoop SQL Edition 2.0 Tutorial AH-2-TU-05 This Documentation is for the end user's informational purposes only and may be subject to change or withdrawal by Actian Corporation
IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
