PivotalR: A Package for Machine Learning on Big Data
|
|
- Kenneth Hood
- 8 years ago
- Views:
Transcription
1 PivotalR: A Package for Machine Learning on Big Data Hai Qian Predictive Analytics Team, Pivotal Inc. madlib@gopivotal.com Copyright 2013 Pivotal. All rights reserved.
2 What Can Small Data Scientists Bring on Their Big Data Journey? Small Data Ra nd p use ython r Databases In-me m Flat files ory m buildin odel g Command-line tools Big Data nd Ra n o pyth r use MapRe duce Many tools and approaches are being adapted to big data technologies S HDF Cloud computing P MP ses aba t a d puting m o c uted b i r t s i D Command-line tools Copyright 2014 Pivotal. All rights reserved.
3 Tools for Data Scientists COMMERCIAL OPEN SOURCE (OR FREE) PL/R, PL/Python PL/Java Copyright 2014 Pivotal. All rights reserved.
4 PivotalR Design Overview Database/Hadoop w/ MADlib RPostgreSQL PivotalR 2. SQL to execute 1. R SQL 3. Computation results No data here Copyright 2014 Pivotal. All rights reserved. Data lives here
5 Data Operators crossprod, scale, sample Arith, Logical, Cast Extraction: x[,-2], x[,1:3], x[,c( rings, sex )], x$arr[, 1] Replacement: x[x$sex == I,] <- NA Support array columns Easy to construct complicated SQL queries. For example, filter NULL values for (i in 1:ncol(w)) w <- w[!is.na(w[i]),] Copyright 2013 Pivotal. All rights reserved.
6 Machine learning Current MADlib wrapper functions: madlib.lm, madlib.glm, madlib.summary, madlib.arima, madlib.elnet Related functions: generic.cv, generic.bagging, margins, predict Support for formula y Support for categorical variables as.factor, predict Copyright 2013 Pivotal. All rights reserved. ~. - x[1:2] - z + factor(w) relevel,
7 MADlib In-database Machine Learning Library Pivotal Confidential Internal Use Only
8 How Pivotal Data Scientists Select Which Pivotal Tool to Use Copyright 2014 Pivotal. All rights reserved.
9 MADlib: Toolkit for Advanced Big Data Analytics Better Parallelism Algorithms designed to leverage MPP or Hadoop architecture Better Scalability Algorithms scale as your data set scales No data movement Better Predictive Accuracy Using all data, not a sample, may improve accuracy Open Source Available for customization and optimization by user Copyright 2014 Pivotal. All rights reserved.
10 Which platforms does it run on? (Partly ported) Impala HAWQ HDFS Pivotal Confidential Internal Use Only GPDB PostgreSQL
11 Shared-Nothing Database Architecture MPP (Massively Parallel Processing) Master Servers... SQL MapReduce... Query planning & dispatch Network Interconnect Segment Servers Query processing & data storage External Sources Loading, streaming, etc. Pivotal Confidential Internal Use Only......
12 Pivotal Confidential Internal Use Only
13 Example usage Train a model Predict for new data Pivotal Confidential Internal Use Only
14 But not all Data Scientists speak SQL Accessing Scalability through R Pivotal Confidential Internal Use Only
15 PivotalR: A familiar R interface Current version Pivotal R d <- db.data.frame( houses") houses_linregr <- madlib.lm( price ~ tax + bath + size, data=d) Copyright 2014 Pivotal. All rights reserved. SQL Code SELECT madlib.linregr_train( 'houses, 'houses_linregr, 'price, 'ARRAY[1, tax, bath, size] );
16 Machine learning Something that MADlib cannot do generic.cv margins Not easy to do in MADlib on the server side as.factor and relevel step Copyright 2013 Pivotal. All rights reserved.
17 Quick Prototype Examples (see the script): Linear regression PCA Poisson regression Left inverse of a matrix AdaBoost Portable Same code on all supported platforms Copyright 2013 Pivotal. All rights reserved.
18 Sending R code into Databases Write R scripts that be sent into the database No translation to SQL Any R functions Copyright 2013 Pivotal. All rights reserved.
19 Testing Framework R CMD INSTALL --install-tests PivotalR_ tar.gz Copyright 2013 Pivotal. All rights reserved. Based on testthat
20 Future Work Better support of PL/R Better graphics support Support more platforms Copyright 2013 Pivotal. All rights reserved.
21 Additional References MADlib PivotalR Video Demo Copyright 2014 Pivotal. All rights reserved.
Enabling R for Big Data with PL/R and PivotalR Real World Examples on Hadoop & MPP Databases
Enabling R for Big Data with PL/R and PivotalR Real World Examples on Hadoop & MPP Databases Woo J. Jung Principal Data Scientist Pivotal Labs 1 All In On Open Source Still can t believe we did this. Truly
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationAccelerating GeoSpatial Data Analytics With Pivotal Greenplum Database
Copyright 2014 Pivotal Software, Inc. All rights reserved. 1 Accelerating GeoSpatial Data Analytics With Pivotal Greenplum Database Kuien Liu Pivotal, Inc. FOSS4G Seoul 2015 Warm-up: GeoSpatial on Hadoop
More informationGreenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum
Greenplum Database Getting Started with Big Data Analytics Ofir Manor Pre Sales Technical Architect, EMC Greenplum 1 Agenda Introduction to Greenplum Greenplum Database Architecture Flexible Database Configuration
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationI/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationHadoop & SAS Data Loader for Hadoop
Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle
More informationEMC GREENPLUM DATABASE
EMC GREENPLUM DATABASE Driving the future of data warehousing and analytics Essentials A shared-nothing, massively parallel processing (MPP) architecture supports extreme performance on commodity infrastructure
More informationA Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationG-Cloud Big Data Suite Powered by Pivotal. December 2014. G-Cloud. service definitions
G-Cloud Big Data Suite Powered by Pivotal December 2014 G-Cloud service definitions TABLE OF CONTENTS Service Overview... 3 Business Need... 6 Our Approach... 7 Service Management... 7 Vendor Accreditations/Awards...
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
More informationEMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data
EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationHAWQ Architecture. Alexey Grishchenko
HAWQ Architecture Alexey Grishchenko Who I am Enterprise Architect @ Pivotal 7 years in data processing 5 years of experience with MPP 4 years with Hadoop Using HAWQ since the first internal Beta Responsible
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationFederated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,
More informationBig Data and Big Analytics
Big Data and Big Analytics Introducing SciDB Open source, massively parallel DBMS and analytic platform Array data model (rather than SQL, Unstructured, XML, or triple-store) Extensible micro-kernel architecture
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationMassive scale analytics with Stratosphere using R
Massive scale analytics with Stratosphere using R Jose Luis Lopez Pino jllopezpino@gmail.com Database Systems and Information Management Technische Universität Berlin Supervised by Volker Markl Advised
More informationIn-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
More informationName: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationTax Fraud in Increasing
Preventing Fraud with Through Analytics Satya Bhamidipati Data Scientist Business Analytics Product Group Copyright 2014 Oracle and/or its affiliates. All rights reserved. 2 Tax Fraud in Increasing 27%
More informationUniversal PMML Plug-in for EMC Greenplum Database
Universal PMML Plug-in for EMC Greenplum Database Delivering Massively Parallel Predictions Zementis, Inc. info@zementis.com USA: 6125 Cornerstone Court East, Suite #250, San Diego, CA 92121 T +1(619)
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationPivotal HAWQ 1.2.1 Release Notes
Pivotal HAWQ 1.2.1 Release Notes Rev: A03 Published: September 15, 2014 Updated: November 12, 2014 Contents About the Pivotal HAWQ Components What's New in the Release Supported Platforms Installation
More informationTackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
More informationFundamentals Curriculum HAWQ
Fundamentals Curriculum Pivotal Hadoop 2.1 HAWQ Education Services zdata Inc. 660 4th St. Ste. 176 San Francisco, CA 94107 t. 415.890.5764 zdatainc.com Pivotal Hadoop & HAWQ Fundamentals Course Description
More informationBIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
More informationBig Data and the Data Lake. February 2015
Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act
More informationMike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationMassive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationParallel Time Series Modeling - A Case Study of In-Database Big Data Analytics
Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics Hai Qian 1, Shengwen Yang 1, Rahul Iyer 1, Xixuan Feng 1, Mark Wellons 2, and Caleb Welton 1 1 Predictive Analytics Team,
More informationCustomized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
More information6 Steps to Faster Data Blending Using Your Data Warehouse
6 Steps to Faster Data Blending Using Your Data Warehouse Self-Service Data Blending and Analytics Dynamic market conditions require companies to be agile and decision making to be quick meaning the days
More information2015 The MathWorks, Inc. 1
25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationEMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationWhat s Cooking in KNIME
What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Deep dive into Haven Predictive Analytics Powered by HP Distributed R and
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationwww.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
More informationBIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE
BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationInternet of Things. Opportunity Challenges Solutions
Internet of Things Opportunity Challenges Solutions Copyright 2014 Boeing. All rights reserved. GPDIS_2015.ppt 1 ANALYZING INTERNET OF THINGS USING BIG DATA ECOSYSTEM Internet of Things matter for... Industrial
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationOracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
More informationSTATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II)
STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II) With the New Basel Capital Accord of 2001 (BASEL II) the banking industry
More informationBig Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationSQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationWhat s New in MATLAB and Simulink
What s New in MATLAB and Simulink Kevin Cohan Product Marketing, MATLAB Michael Carone Product Marketing, Simulink 2015 The MathWorks, Inc. 1 What was new for Simulink in R2012b? 2 What Was New for MATLAB
More informationHarnessing the power of advanced analytics with IBM Netezza
IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced
More informationPrerequisites. Course Outline
MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,
More informationFile S1: Supplementary Information of CloudDOE
File S1: Supplementary Information of CloudDOE Table of Contents 1. Prerequisites of CloudDOE... 2 2. An In-depth Discussion of Deploying a Hadoop Cloud... 2 Prerequisites of deployment... 2 Table S1.
More informationHigh Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
More informationModern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
More informationBig Data and Data Science. The globally recognised training program
Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationWhite Paper. Unified Data Integration Across Big Data Platforms
White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using
More informationOracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationUnified Data Integration Across Big Data Platforms
Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta
More information6.0, 6.5 and Beyond. The Future of Spotfire. Tobias Lehtipalo Sr. Director of Product Management
6.0, 6.5 and Beyond The Future of Spotfire Tobias Lehtipalo Sr. Director of Product Management Key peformance indicators Hundreds of Records Visual Data Discovery Millions of Records Data Mining or Data
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationHigh-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve
More informationPredictive Analytics
Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is
More informationDiscovering Business Insights in Big Data Using SQL-MapReduce
Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013
More informationData Science And Big Data Analytics Course
Data Science And Big Data Analytics Course Copyright 2014 EMC Corpora3on. All Rights Reserved. Introduc3on and Course Agenda 1 Introduc3on and Course Agenda 2 Introduc3on and Course Agenda The following
More informationNavigating Big Data business analytics
mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationOracle Big Data Strategy Simplified Infrastrcuture
Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly
More informationBig Data Are You Ready? Jorge Plascencia Solution Architect Manager
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationHow To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationBig Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012
Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012 Between the dawn of civilization and 2003, the human race created 5 exabytes of data
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationArchitecture & Experience
Architecture & Experience Data Mining - Combination from SAP HANA, R & Hadoop Markus Severin, Solution Principal Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein
More informationA Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud Thuy D. Nguyen, Cynthia E. Irvine, Jean Khosalim Department of Computer Science Ground System Architectures Workshop
More informationBringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata
Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata International) 4 th June 2015 Agenda A (very!) short history of
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More information