H2O in KNIME: Integrating High Performance Machine Learning
|
|
- Gillian Ross
- 6 years ago
- Views:
Transcription
1 H2O in KNIME: Integrating High Performance Machine Learning Jo-Fai Chow (H2O.ai), Marten Pfannenschmidt (KNIME), Christian Dietz (KNIME) 2018 KNIME AG. All Rights Reserved.
2 H2O for High Performance Machine Learning 2018 KNIME AG. All Rights Reserved. 2
3 3
4 Gartner 2018 Magic Quadrant for Data Science and Machine Learning Platforms 4
5 High Level Architecture HDFS H 2 O Compute Engine S3 NFS Load Data Distributed In-Memory Loss-less Compression Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised ing Evaluation & Selection Predict Data & Storage Local SQL Data Prep Export: Plain Old Java Object Export: Plain Old Java Object Production Scoring Environment Your Imagination 5
6 High Level Architecture Multiple Interfaces HDFS H 2 O Compute Engine S3 NFS Load Data Distributed In-Memory Loss-less Compression Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised ing Evaluation & Selection Predict Data & Storage Local SQL Data Prep Export: Plain Old Java Object Export: Plain Old Java Object Production Scoring Environment Your Imagination 6
7 Algorithms Overview Supervised Learning Unsupervised Learning Statistical Analysis Generalized Linear s: Binomial, Gaussian, Gamma, Poisson and Tweedie Naïve Bayes Clustering K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Ensembles Distributed Random Forest: Classification or regression models Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Dimensionality Reduction Principal Component Analysis: Linearly transforms correlated variables to independent components Generalized Low Rank s: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Deep Neural Networks Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Anomaly Detection Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning 7
8 Scientific Advisory Council 8
9 H 2 O on a Single Machine Building CPU H2O
10 H 2 O on a Multi-Node Cluster Firewall or Cloud Building SQL NFS H2O Distributed In-Memory S3 CPU CPU CPU YARN
11 Foundation for Distributed Algorithms Distributed Algorithms Parallel Parse into Distributed Rows Fine Grain Map Reduce Illustration: Scalable Distributed Histogram Calculation for GBM Advantageous Foundation Foundation for In-Memory Distributed Algorithm Calculation - Distributed Data Frames and columnar compression All algorithms are distributed in H 2 O: GBM, GLM, DRF, Deep Learning and more. Fine-grained map-reduce iterations. Only enterprise-grade, open-source distributed algorithms in the market User Benefits Out-of-box functionalities for all algorithms (NO MORE SCRIPTING) and uniform interface across all languages: R, Python, Java Designed for all sizes of data sets, especially large data Highly optimized Java code for model exports In-house expertise for all algorithms 11
12 Performance Benchmark (GBM) by Szilard Pafka ( Time (s) H 2 O Gradient Boosting Machine (GBM) is the fastest algorithm in the test of 10M Samples. Area Under Curve (AUC) Comparable with xgboost Better than R/Spark MLLib AUC Benchmark results for other algorithms (Generalized Linear, Random Forest etc.) are available in Szilard s GitHub repository. X = million of samples 12
13 H2O in KNIME 2018 KNIME AG. All Rights Reserved. 13
14 H2O in KNIME Offer our users high-performance machine learning algorithms from H2O in KNIME Allow to mix & match with other KNIME functionality Data wrangling KNIME Analytics Platform functionality KNIME Big-Data Connectors Text Mining, Image Processing, Cheminformatics, and more! 2018 KNIME AG. All Rights Reserved. 14
15 H2O in KNIME Live Demo 2018 KNIME AG. All Rights Reserved. 15
16 Customer prediction with H2O in KNIME 2018 KNIME AG. All Rights Reserved. 16
17 Kaggle.com Get the data Build a model Make a submission 2018 KNIME AG. All Rights Reserved. 17
18 The competition Date Store ID Visitors Provided data: Number of visitors Reservations Store information Calendar date info ba937bf13d40fb f7c39a8410e7c 216 Date Store ID Visitors e8ed9335d0c38333? f13ef0f5e8c64dd? 2018 KNIME AG. All Rights Reserved. 18
19 Solving a Kaggle competition with KNIME and H2O Data preparation training optimization evaluation Deployment 2018 KNIME AG. All Rights Reserved. 19
20 Solving a Kaggle competition with KNIME and H2O Data preparation training optimization evaluation Deployment 2018 KNIME AG. All Rights Reserved. 20
21 Data preparation with KNIME Nodes 2018 KNIME AG. All Rights Reserved. 21
22 The blended dataset Data used for model training Date Store ID Visitors Mean visitors Area Reservations Holiday Genre ba937bf13d40fb Area1 15 False Bar f7c39a8410e7c Area2 22 True Karaoke Data used for Kaggle prediction Date Store ID Visitors Mean visitors Area Reservations Holiday Genre e8ed9335d0c38333? 12.5 Area1 10 False Italian f13ef0f5e8c64dd? 54.1 Area7 20 False Izakaya 2018 KNIME AG. All Rights Reserved. 22
23 Solving a Kaggle competition with KNIME and H2O Data preparation training optimization evaluation Deployment 2018 KNIME AG. All Rights Reserved. 23
24 Solving a Kaggle competition with KNIME and H2O Data preparation training optimization evaluation Deployment 2018 KNIME AG. All Rights Reserved. 24
25 ing with the new H2O nodes 2018 KNIME AG. All Rights Reserved. 25
26 ing with the new H2O nodes 2018 KNIME AG. All Rights Reserved. 26
27 ing with the new H2O nodes 2018 KNIME AG. All Rights Reserved. 27
28 ing with the new H2O nodes 2018 KNIME AG. All Rights Reserved. 28
29 Solving a Kaggle competition with KNIME and H2O Data preparation training optimization evaluation Deployment 2018 KNIME AG. All Rights Reserved. 29
30 Solving a Kaggle competition with KNIME and H2O Data preparation training optimization evaluation Deployment KNIME AG. All Rights Reserved. 30
31 What s Cooking? 2018 KNIME AG. All Rights Reserved. 31
32 What s Cooking: Scoring with H2O MOJOs on Spark Scoring using MOJOs 2018 KNIME AG. All Rights Reserved. 32
33 What s Cooking: Scoring with H2O MOJOs on Spark New: Scoring using MOJOs on Spark 2018 KNIME AG. All Rights Reserved. 33
34 What s Cooking: H2O Sparkling Water in KNIME 2018 KNIME AG. All Rights Reserved. 34
35 What s Cooking: H2O Sparkling Water in KNIME 2018 KNIME AG. All Rights Reserved. 35
36 What s Cooking: H2O Sparkling Water in KNIME 2018 KNIME AG. All Rights Reserved. 36
37 What s Cooking: H2O Sparkling Water in KNIME 2018 KNIME AG. All Rights Reserved. 37
38 What s Cooking: H2O Sparkling Water in KNIME 2018 KNIME AG. All Rights Reserved. 38
39 2018 KNIME AG. All Rights Reserved. The KNIME trademark and logo and OPEN FOR INNOVATION trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME is also registered in Germany. 39
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationKNIME Big Data Workshop
KNIME Big Data Workshop Tobias Kötter and Björn Lohrmann KNIME 2016 KNIME.com AG. All Rights Reserved. Variety, Volume, Velocity Variety: integrating heterogeneous data.. and tools Volume: from small files......to
More informationWhat s Cooking in KNIME
What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationData Science in Action
+ Data Science in Action Peerapon Vateekul, Ph.D. Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University + Outlines 2 Data Science & Data Scientist Data Mining Analytics with
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationPredictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar
More informationOracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationDATA ANALYTICS USING R
DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationBenefits of analytics using Microsoft Azure Machine Learning (ML) Tomaž
Benefits of analytics using Microsoft Azure Machine Learning (ML) Tomaž Kaštrun @tomaz_tsql Gold sponsors Platinum sponsor Special Thanks Special Thanks to SQL Saturday Bratislava Organizers! Making SQL
More informationHigh Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
More informationScalable Data Science and Deep Learning with H2O. Arno Candel, PhD Chief Architect, H2O.ai
Scalable Data Science and Deep Learning with H2O Arno Candel, PhD Chief Architect, H2O.ai Who Am I? Arno Candel Chief Architect, Physicist & Hacker at H2O.ai PhD Physics, ETH Zurich 2005 10+ yrs Supercomputing
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationScalable Data Science with Hadoop, Spark and R. Mario Inchiosa, PhD Principal Software Engineer Microsoft Data Group DSC 2016 July 2, 2016
Scalable Data Science with Hadoop, Spark and R Mario Inchiosa, PhD Principal Software Engineer Microsoft Data Group DSC 2016 July 2, 2016 Microsoft R Server Cloud Hadoop & Spark R Server portfolio R Server
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationApril 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.
April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!
More informationR and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs
R and Hadoop: Architectural Options Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs Polling Question #1: Who Are You? (choose one) Statistician or modeler who uses R Other
More informationWhat s new in SAS Enterprise Miner 13.1 and Beyond SAS Talks February 20, 2014
What s new in SAS Enterprise Miner 13.1 and Beyond SAS Talks February 20, 2014 WHAT S NEW IN SAS ENTERPRISE MINER 13.1 AND BEYOND JONATHAN WEXLER SPEAKERS Jonathan Wexler Principal Product Manager SAS
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationWesley Pasfield Northwestern University William.pasifield2016@u.northwestern.edu
March Madness Prediction Using Big Data Technology Wesley Pasfield Northwestern University William.pasifield2016@u.northwestern.edu Abstract- - This paper explores leveraging big data technologies to create
More informationLocation matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
More informationAdvanced Data Science on Spark
Advanced Data Science on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Data Science Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More informationCar Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
More informationHadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationKnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationE6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics
E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big
More informationBig Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify
Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)
More informationMACHINE LEARNING AN INTRODUCTION
AN INTRODUCTION JOSEFIN ROSÉN, SENIOR ANALYTICAL EXPERT, SAS INSTITUTE JOSEFIN.ROSEN@SAS.COM TWITTER: @ROSENJOSEFIN AGENDA What is machine learning? When, where and how is machine learning used? Exemple
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationIs a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationScalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationLavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationMHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationDefending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject
Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning
More informationIST 557 Final Project
George Slota DataMaster 5000 IST 557 Final Project Abstract As part of a competition hosted by the website Kaggle, a statistical model was developed for prediction of United States Census 2010 mailing
More informationLearning outcomes. Knowledge and understanding. Ability and Competences. Evaluation capability and scientific approach
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationDistributed Computing" with Open-Source Software
Distributed Computing" with Open-Source Software Reza Zadeh Presented at Infosys OSSmosis Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use
More informationMACHINE LEARNING BRETT WUJEK, SAS INSTITUTE INC.
MACHINE LEARNING BRETT WUJEK, SAS INSTITUTE INC. AGENDA MACHINE LEARNING Background Use cases in healthcare, insurance, retail and banking Eamples: Unsupervised Learning Principle Component Analysis Supervised
More informationDATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationHUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)
HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks
More informationThe Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?
Conclusions Paper The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Insights from a presentation at the 2014 Hadoop Summit Featuring Brian Garrett, Principal Solutions Architect
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Deep dive into Haven Predictive Analytics Powered by HP Distributed R and
More informationextreme Datamining mit Oracle R Enterprise
extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About
More informationData Mining is the process of knowledge discovery involving finding
using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden
More informationRole Description. Position of a Data Scientist Machine Learning at Fractal Analytics
Opportunity to work with leading analytics firm that creates Insights, Impact and Innovation. Role Description Position of a Data Scientist Machine Learning at Fractal Analytics March 2014 About the Company
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationBig Data and Marketing
Big Data and Marketing Professor Venky Shankar Coleman Chair in Marketing Director, Center for Retailing Studies Mays Business School Texas A&M University http://www.venkyshankar.com venky@venkyshankar.com
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationProcessing of Big Data. Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015
Processing of Big Data Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015 Acknowledgement Some slides in this set of slides were provided by EMC Corporation and Sandra Avila, University
More informationTable of Contents. June 2010
June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and
More informationNeural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi
Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationData Mining mit der JMSL Numerical Library for Java Applications
Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationModel Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/
Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.
More informationBig Data Analytics and Optimization
Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e C e r t i f i c a t e P r o g r a m s i n A c c e l e r a t e d E n g i n e e r i n
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationQlik Sense scalability
Qlik Sense scalability Visual analytics platform Qlik Sense is a visual analytics platform powered by an associative, in-memory data indexing engine. Based on users selections, calculations are computed
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationWhat is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014
What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa
More information