Universal PMML Plug-in for EMC Greenplum Database

Size: px
Start display at page:

Download "Universal PMML Plug-in for EMC Greenplum Database"

Transcription

1 Universal PMML Plug-in for EMC Greenplum Database Delivering Massively Parallel Predictions Zementis, Inc. USA: 6125 Cornerstone Court East, Suite #250, San Diego, CA T +1(619) Asia: 19/F, Unit A Ho Lee Commercial Bldg D Aguilar Street, Central, Hong Kong T

2 Delivering Massively Parallel Predictions As advanced analytics becomes pervasive across the enterprise to drive better business decisions, the need for efficient execution of predictive models is paramount. Zementis and Greenplum join forces to help companies easily bring predictive models into their database and score in-place and in-parallel huge amounts of data. This joint product combines the Zementis Universal PMML Plug-in for execution of predictive models with the power and scale of the EMC Greenplum Database. The result is an end-to-end solution that enhances Greenplum s large scale analytics processing capabilities with scoring of standards-based predictive models on a massively parallel architecture. By embedding predictive analytics directly in the database, this solution minimizes the movement of data and enables the efficient in-place processing of very large data sets. In this whitepaper, we demonstrate how to deploy and execute predictive models from several statistical tools, including IBM SPSS and the open source R program. Predictive Model Markup Language (PMML) As the de-facto standard for data mining models, PMML provides tremendous benefits for business, IT, and the data mining industry in general. Developed by the Data Mining Group (DMG - an independent, vendor-led consortium, PMML increases business agility by eliminating the need for proprietary solutions or custom code development. Today, it is supported by all the major data mining tools, commercial and open source. As an open standard, it enables project stakeholders to standardize on one common representation for data mining models. It practically eliminates the barriers and gaps between development and production deployment of predictive analytics. In effect, it minimizes the complexity, cost, and time to turn predictive models into operable IT and business assets. As the lingua franca for predictive analytics, data mining models can be easily exchanged between PMMLcompliant applications. In this way, a model may be built in one statistical tool and easily moved to another for production deployment or visualization. PMML also serves as a bridge between all the teams involved in the data mining process inside a company since it can be used to disseminate knowledge and best practices. In a world in which sensors and data gathering are becoming more and more pervasive, predictive analytics and standards such as PMML make it possible for organizations to benefit from smart solutions that will truly revolutionize their business. Universal PMML Plug-in for EMC Greenplum Database 1

3 Zementis Universal PMML Plug-in The Universal PMML Plug-in (Figure 1) builds on the heritage of Zementis s flagship product, the ADAPA Decision Engine, a web services-based framework for the execution of predictive analytics and rules, available onsite or as cloud computing platform. The Universal PMML Plug-in is a highly optimized, in-database scoring engine for predictive models, fully supporting the PMML standard. With PMML, the Plug-in delivers a wide range of predictive analytics for high performance scoring. It shortens time to market for predictive models and empowers users through instant deployment of predictive models. Figure 1: The Universal PMML Plug-in. Data in, predictions out. In the context of in-database scoring, it allows us to execute predictive models from all major commercial and open source data mining tools within the database, minimizing data movement and maximizing processing efficiency. Very large datasets can be easily scored against a variety of predictive models including neural network models, regression models, support vector machines, and decision trees (as well as a host of other advance analytic techniques). Besides models per se, the Universal PMML Plug-in also supports data pre- and post-processing. That is because the latest version of the PMML standard is loaded with built-in functions which allow for arithmetic calculations, string manipulations as well as logic operations. An entire predictive solution, one that operates from raw data all the way to predictions, can be represented in PMML and directly used in the Universal PMML Plug-in for data scoring. The Universal PMML Plug-in not only supports the latest version of PMML, but also older versions. In fact, it is version agnostic since it incorporates a converter which automatically converts older versions of PMML to its newest. EMC Greenplum Database Architecture The EMC Greenplum Database utilizes a shared-nothing MPP (massively parallel processing) architecture that has been designed from the ground up for BI and analytical processing using commodity hardware. In this architecture, data is automatically partitioned across multiple 'segment' servers, and each 'segment' owns and manages a distinct portion of the overall data. All communication is via a network interconnect -- there is no disk-level sharing or contention to be concerned with (i.e. it is a 'shared-nothing' architecture). Most of today s general-purpose relational database management systems (e.g. Oracle, Microsoft SQL Server) were originally designed for Online Transaction Processing (OLTP) applications. These databases utilize 'shared-disk' or 'shared-everything' architectures that are optimized for high transaction rates at the expense of individual query performance and parallelism. Greenplum s shared-nothing MPP architecture (Figure 2) provides every segment with a dedicated, independent high-bandwidth channel to its disk. The segment servers are able to process every query in a fully parallel manner, Universal PMML Plug-in for EMC Greenplum Database 2

4 use all disk connections simultaneously, and efficiently flow data between segments as query plans dictates. The degree of parallelism and overall scalability that this allows far exceeds general purpose database systems. Figure 2: Greenplum s shared-nothing MPP architecture Universal PMML Plug-in for the EMC Greenplum Database The Universal PMML Plugin for the EMC Greenplum Database enables execution of standards-based predictive analytics directly within the Greenplum Database. It seamlessly embeds the Universal PMML Plug-in into Greenplum s shared-nothing, massively parallel processing (MPP) architecture. The Universal Plug-in s own shared-nothing design philosophy and replication flexibility fits like a glove into multi-server environments. With Greenplum, each individual server (with a dedicated, independent, high-bandwidth channel connection to local disks) houses a separate Universal PMML Plug-in instance that can take full advantage of these local resources (Figure 3). The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides. The EMC Greenplum PMML Plug-in not only delivers high performance model execution but it does so in an easy and seamless manner. With a couple of simple steps, PMML models are distributed to all segments of the Greenplum installation and are made available for execution. Each model is presented as a separate SQL function that can be used in any query. The name, input parameters and outputs of each function matches the name, input fields, and output fields of the corresponding model as defined in the corresponding PMML file. This way, scoring a Universal PMML Plug-in for EMC Greenplum Database 3

5 data set with one or more models becomes as simple as writing a SQL statement on that data set. Predictions (scores, probabilities, categories, clusters, etc.) can be just as easily written back to the database, become part of a report, or passed on to an application. Figure 3: Each individual server houses a separate Universal PMML Plug-in instance. In addition, the Universal PMML Plug-in includes the popular Zementis PMML Converter. This means that it accepts PMML models of all versions (2.0, 2.1, 3.0, 3.1, 3.2, and 4.0) generated by any of the major commercial and open source mining tools. Example: Use IBM SPSS and R Models in Greenplum The Universal PMML Plug-in for the EMC Greenplum Database ships with several sample PMML models. A number of these predictive models were created with the well-known Elnino data set. This data set contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific. The data is expected to aid in the understanding and prediction of El Nino/Southern Oscillation (ENSO) cycles (see Here, we discuss two of these: A neural network model built in IBM SPSS Statistics; A linear regression model built in R. Universal PMML Plug-in for EMC Greenplum Database 4

6 After being built, all models were directly exported into PMML since IBM SPSS and R provide comprehensive support for the PMML standard. The steps to install and use these models in Greenplum using the PMML plugin are: 1. Prepare and copy PMML files into the Greenplum segments 2. Run the automatically generated script to define the corresponding SQL functions 3. Run queries using the new SQL functions Each step is described in detail below. Prepare and Copy PMML Files In the first step, a script needs to be run to validate the provided PMML files, copy them into the Greenplum segments, and generate a SQL script containing the function definitions for all the provided models. Below we present an excerpt from the SQL script generated for the two sample models. CREATE FUNCTION SPSS_Neural_Network_ElNino(float8,float8,float8,float8,float8,float8) RETURNS float8 AS CREATE FUNCTION R_LinearRegression_ElNino(float8,float8,float8,float8,float8,float8) RETURNS float8 AS To put these definitions in context, below we present the code for the PMML data dictionary and mining schema for the IBM SPSS Neural Network model as listed in the corresponding PMML file. <DataDictionary numberoffields="7"> <DataField name="humidity" optype="continuous" datatype="double"/> <DataField name="latitude" optype="continuous" datatype="double"/> <DataField name="longitude" optype="continuous" datatype="double"/> <DataField name="mer_winds" optype="continuous" datatype="double"/> <DataField name="s_s_temp" optype="continuous" datatype="double"/> <DataField name="zon_winds" optype="continuous" datatype="double"/> <DataField name="airtemp" optype="continuous" datatype="double"/> </DataDictionary> <NeuralNetwork functionname="regression" activationfunction="logistic" modelname="spss Neural Network - ElNino"> <MiningSchema> <MiningField name="humidity" usagetype="active" optype="continuous"/> <MiningField name="latitude" usagetype="active" optype="continuous"/> <MiningField name="longitude" usagetype="active" optype="continuous"/> <MiningField name="mer_winds" usagetype="active" optype="continuous"/> <MiningField name="s_s_temp" usagetype="active" optype="continuous"/> <MiningField name="zon_winds" usagetype="active" optype="continuous"/> <MiningField name="airtemp" usagetype="predicted" optype="continuous"/> </MiningSchema> In the SQL script, each model is presented as a function with six numeric parameters; they all work on the same data and return one numeric value. The name of the SQL function is created from the name of the model (SPSS_Neural_Network_ElNino). The six numeric parameters correspond to the six input (active mining) fields of Universal PMML Plug-in for EMC Greenplum Database 5

7 type double defined in the PMML file (humidity, latitude, longitude, mer_winds, s_s_temp, and zon_winds). Finally, the numeric return value of the SQL function reflects the predicted output field of type double (airtemp). Run SQL Script to Create SQL Functions The second step is to run the generated SQL script to create the new functions. After the new functions are created, the predictive models are ready to be used in SQL queries like any other built-in or custom function. Execute Queries to Score Data With the installation steps completed, the predictive models can be easily used in SQL queries. Below is an example of such a query: SELECT buoy_day_id, SPSS_Neural_Network_ElNino (latitude, longitude, zon_winds, mer_winds, humidity, s_s_temp) AS airtemp FROM elnino_input Getting predictions from the two models at the same time would be just as easy: SELECT buoy_day_id, R_Linear_Regression_ElNino(latitude, longitude, zon_winds, mer_winds, humidity, s_s_temp) AS airtemp_r, SPSS_Neural_Network_ElNino(latitude, longitude, zon_winds, mer_winds, humidity, s_s_temp) AS airtemp_nn FROM elnino_input Advantages of the Universal PMML Plug-in for the EMC Greenplum Database Zementis and EMC Greenplum bring together two essential technologies, offering the best combination of open standards and scalability for the in-database application of predictive analytics. The Universal PMML Plug-in delivers instant and scalable scoring for big data while retaining compatibility with most major data mining tools through the PMML Standard. In summary, the Universal PMML Plug-in for the EMC Greenplum Database Integrates advanced analytical algorithms directly into the database engine for high-performance scoring in a massively parallel environment; Supports the PMML standard to avoid time-consuming and expensive one-off predictive analytics projects; Executes predictive models from all major commercial and open source data mining tools; Minimizes data movement to enable efficient processing of very large data sets; and Reduces total cost of ownership (TCO) for analytical environment by means of streamlined and platformindependent data mining processes. Universal PMML Plug-in for EMC Greenplum Database 6

8 About Greenplum and the EMC Data Computing Products Division EMC s new Data Computing Products Division is driving the future of data warehousing and analytics with breakthrough products including Greenplum Database 4.1, Greenplum Data Computing Appliance (DCA), Greenplum Database Single-Node Edition, Greenplum Community Edition and Greenplum Chorus. The division s products embody the power of open systems, cloud computing, virtualization, and social collaboration enabling global organizations to gain greater insight and value from their data than ever before possible. For more information, please visit About Zementis Zementis, Inc. is a leading software company focused on the operational deployment and integration of predictive analytics and data mining solutions. Its ADAPA decision engine successfully bridges the gap between science and engineering. ADAPA and the Universal PMML Plug-in are designed from the ground up to benefit from open standards and to significantly shorten the time-to-market for predictive analytics in any industry. For more information, please visit Universal PMML Plug-in for EMC Greenplum Database 7

Easy Execution of Data Mining Models through PMML

Easy Execution of Data Mining Models through PMML Easy Execution of Data Mining Models through PMML Zementis, Inc. UseR! 2009 Zementis Development, Deployment, and Execution of Predictive Models Development R allows for reliable data manipulation and

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database White Paper Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database Abstract This white paper explores the technology

More information

July 2015. Zementis for IBM z Systems

July 2015. Zementis for IBM z Systems July 2015 Zementis for IBM z Systems Page 1 Zementis for IBM z Systems An integrated predictive analytics deployment and scoring capability for organizations managing data and transactions with IBM z Systems

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

How to Optimize Your Data Mining Environment

How to Optimize Your Data Mining Environment WHITEPAPER How to Optimize Your Data Mining Environment For Better Business Intelligence Data mining is the process of applying business intelligence software tools to business data in order to create

More information

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues

More information

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2

More information

Harnessing the power of advanced analytics with IBM Netezza

Harnessing the power of advanced analytics with IBM Netezza IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced

More information

The R pmmltransformations Package

The R pmmltransformations Package The R pmmltransformations Package Tridivesh Jena Alex Guazzelli Wen-Ching Lin Michael Zeller Zementis, Inc.* Zementis, Inc. Zementis, Inc. Zementis, Inc. Tridivesh.Jena@ Alex.Guazzelli@ Wenching.Lin@ Michael.Zeller@

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum Greenplum Database Getting Started with Big Data Analytics Ofir Manor Pre Sales Technical Architect, EMC Greenplum 1 Agenda Introduction to Greenplum Greenplum Database Architecture Flexible Database Configuration

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

EMC GREENPLUM DATABASE

EMC GREENPLUM DATABASE EMC GREENPLUM DATABASE Driving the future of data warehousing and analytics Essentials A shared-nothing, massively parallel processing (MPP) architecture supports extreme performance on commodity infrastructure

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 11.1 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Customer Insight Appliance. Enabling retailers to understand and serve their customer

Customer Insight Appliance. Enabling retailers to understand and serve their customer Customer Insight Appliance Enabling retailers to understand and serve their customer Customer Insight Appliance Enabling retailers to understand and serve their customer. Technology has empowered today

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 12.0 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Conclusions Paper The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Insights from a presentation at the 2014 Hadoop Summit Featuring Brian Garrett, Principal Solutions Architect

More information

Operationalise Predictive Analytics

Operationalise Predictive Analytics Operationalise Predictive Analytics Publish SPSS, Excel and R reports online Predict online using SPSS and R models Access models and reports via Android app Organise people and content into projects Monitor

More information

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SAP @cgadalla SESSION CODE: 603

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SAP @cgadalla SESSION CODE: 603 SAP Predictive Analytics: An Overview and Roadmap Charles Gadalla, SAP @cgadalla SESSION CODE: 603 Advanced Analytics SAP Vision Embed Smart Agile Analytics into Decision Processes to Deliver Business

More information

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief Optimizing Storage for Better TCO in Oracle Environments INFOSTOR Executive Brief a QuinStreet Excutive Brief. 2012 To the casual observer, and even to business decision makers who don t work in information

More information

Copyright 2012 EMC Corporation. All rights reserved.

Copyright 2012 EMC Corporation. All rights reserved. 1 Greenplum UAP Enabling Big Data Analytics Brendon Moran Data Scientist 2 Agenda Background On Greenplum And Big Data Analytics Greenplum UAP Greenplum: Not Just Infrastructure Pivotal Labs Customers

More information

MASSIVEDATANEWS. Load and Go: Fast Data Loading with the Greenplum Data Computing Appliance (DCA)

MASSIVEDATANEWS. Load and Go: Fast Data Loading with the Greenplum Data Computing Appliance (DCA) Greenplum Data Computing Appliance (DCA) Introduction: Why Fast and Flexible Data Loading Matters Data loading is the beginning of the entire analytics process. Everything starts by getting data into the

More information

Next Generation Data Mining. Data Mining Automation & Realtime-Scoring "on-the-cloud.

Next Generation Data Mining. Data Mining Automation & Realtime-Scoring on-the-cloud. Next Generation Data Mining. Data Mining Automation & Realtime-Scoring "on-the-cloud. Outline DYMATRIX & Zementis Overview Consulting & Product Expertise DynaMine & ADAPA Solution Framework Case Study:

More information

Data Virtualization Overview

Data Virtualization Overview Data Virtualization Overview Take Big Advantage of Your Data "Using a data virtualization technique is: number one, much quicker time to market; number two, much more cost effective; and three, gives us

More information

Develop Predictive Models Using Your Business Expertise

Develop Predictive Models Using Your Business Expertise Clementine 8.5 Specifications Develop Predictive Models Using Your Business Expertise Clementine is an integrated data mining workbench, popular worldwide with data miners and business analysts alike.

More information

Data Warehouse Appliances: The Next Wave of IT Delivery. Private Cloud (Revocable Access and Support) Applications Appliance. (License/Maintenance)

Data Warehouse Appliances: The Next Wave of IT Delivery. Private Cloud (Revocable Access and Support) Applications Appliance. (License/Maintenance) Appliances are rapidly becoming a preferred purchase option for large and small businesses seeking to meet expanding workloads and deliver ROI in the face of tightening budgets. TBR is reporting the results

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

ETPL Extract, Transform, Predict and Load

ETPL Extract, Transform, Predict and Load ETPL Extract, Transform, Predict and Load An Oracle White Paper March 2006 ETPL Extract, Transform, Predict and Load. Executive summary... 2 Why Extract, transform, predict and load?... 4 Basic requirements

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

IBM SPSS Modeler Professional

IBM SPSS Modeler Professional IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model

More information

High Performance Analytics with In-Database Processing

High Performance Analytics with In-Database Processing High Performance Analytics with In-Database Processing Stephen Brobst, Chief Technology Officer, Teradata Corporation, San Diego, CA Keith Collins, Senior Vice President & Chief Technology Officer, SAS

More information

IBM SPSS Modeler Professional

IBM SPSS Modeler Professional IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Lowering the Total Cost of Ownership (TCO) of Data Warehousing

Lowering the Total Cost of Ownership (TCO) of Data Warehousing Ownership (TCO) of Data If Gordon Moore s law of performance improvement and cost reduction applies to processing power, why hasn t it worked for data warehousing? Kognitio provides solutions to business

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

2015 Ironside Group, Inc. 2

2015 Ironside Group, Inc. 2 2015 Ironside Group, Inc. 2 Introduction to Ironside What is Cloud, Really? Why Cloud for Data Warehousing? Intro to IBM PureData for Analytics (IPDA) IBM PureData for Analytics on Cloud Intro to IBM dashdb

More information

Improve Results with High- Performance Data Mining

Improve Results with High- Performance Data Mining Clementine 10.0 Specifications Improve Results with High- Performance Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events. With

More information

Driving Peak Performance. 2013 IBM Corporation

Driving Peak Performance. 2013 IBM Corporation Driving Peak Performance 1 Session 2: Driving Peak Performance Abstract We know you want the fastest performance possible for your deployments, and yet that relies on many choices across data storage,

More information

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting BIG DATA APPLIANCES July 23, TDWI R Sathyanarayana Enterprise Information Management & Analytics Practice EMC Consulting 1 Big data are datasets that grow so large that they become awkward to work with

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Microsoft Dynamics AX 2012 A New Generation in ERP

Microsoft Dynamics AX 2012 A New Generation in ERP A New Generation in ERP Mike Ehrenberg Technical Fellow Microsoft Corporation April 2011 Microsoft Dynamics AX 2012 is not just the next release of a great product. It is, in fact, a generational shift

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 Your Data, Any Place, Any Time Executive Summary: More than ever, organizations rely on data

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

A financial software company

A financial software company A financial software company Projecting USD10 million revenue lift with the IBM Netezza data warehouse appliance Overview The need A financial software company sought to analyze customer engagements to

More information

Focus on the business, not the business of data warehousing!

Focus on the business, not the business of data warehousing! Focus on the business, not the business of data warehousing! Adam M. Ronthal Technical Product Marketing and Strategy Big Data, Cloud, and Appliances @ARonthal 1 Disclaimer Copyright IBM Corporation 2014.

More information

E M C P E R S P E C T I V E MANAGING HEALTHCARE DATA WITHIN THE ECOSYSTEM WHILE REDUCING IT COSTS AND COMPLEXITIES

E M C P E R S P E C T I V E MANAGING HEALTHCARE DATA WITHIN THE ECOSYSTEM WHILE REDUCING IT COSTS AND COMPLEXITIES E M C P E R S P E C T I V E MANAGING HEALTHCARE DATA WITHIN THE ECOSYSTEM WHILE REDUCING IT COSTS AND COMPLEXITIES With more than 3,000 attendees and hundreds of exhibitors, the annual HIMSS World Health

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

IBM SPSS Modeler Premium

IBM SPSS Modeler Premium IBM SPSS Modeler Premium Improve model accuracy with structured and unstructured data, entity analytics and social network analysis Highlights Solve business problems faster with analytical techniques

More information

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES ORACLE TAX ANALYTICS KEY FEATURES A set of comprehensive and compatible BI Applications. Advanced insight into tax performance Built on World Class Oracle s Database and BI Technology Design after the

More information

Netezza and Business Analytics Synergy

Netezza and Business Analytics Synergy Netezza Business Partner Update: November 17, 2011 Netezza and Business Analytics Synergy Shimon Nir, IBM Agenda Business Analytics / Netezza Synergy Overview Netezza overview Enabling the Business with

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

Why compute in parallel? Cloud computing. Big Data 11/29/15. Introduction to Data Management CSE 344. Science is Facing a Data Deluge!

Why compute in parallel? Cloud computing. Big Data 11/29/15. Introduction to Data Management CSE 344. Science is Facing a Data Deluge! Why compute in parallel? Introduction to Data Management CSE 344 Lectures 23 and 24 Parallel Databases Most processors have multiple cores Can run multiple jobs simultaneously Natural extension of txn

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

Contents. Overview. The solid foundation for your entire, enterprise-wide business intelligence system

Contents. Overview. The solid foundation for your entire, enterprise-wide business intelligence system Data Warehouse The solid foundation for your entire, enterprise-wide business intelligence system The core of the high-performance intelligence delivery infrastructure, designed to meet even the most demanding

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

IBM Netezza High Capacity Appliance

IBM Netezza High Capacity Appliance IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

Five Best Practices for Maximizing Big Data ROI

Five Best Practices for Maximizing Big Data ROI E-PAPER FEBRUARY 2014 Five Best Practices for Maximizing Big Data ROI Lessons from early adopters show how IT can deliver better business results at less cost. TW_1401138 Organizations of all kinds have

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Beyond Conventional Data Warehousing. Florian Waas Greenplum Inc.

Beyond Conventional Data Warehousing. Florian Waas Greenplum Inc. Beyond Conventional Data Warehousing Florian Waas Greenplum Inc. Takeaways The basics Who is Greenplum? What is Greenplum Database? The problem Data growth and other recent trends in DWH A look at different

More information

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics WHITE PAPER Harnessing the Power of Advanced How an appliance approach simplifies the use of advanced analytics Introduction The Netezza TwinFin i-class advanced analytics appliance pushes the limits of

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Expand

More information

IBM SPSS Modeler 15 In-Database Mining Guide

IBM SPSS Modeler 15 In-Database Mining Guide IBM SPSS Modeler 15 In-Database Mining Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 217. This edition applies to IBM SPSS Modeler

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Five Technology Trends for Improved Business Intelligence Performance

Five Technology Trends for Improved Business Intelligence Performance TechTarget Enterprise Applications Media E-Book Five Technology Trends for Improved Business Intelligence Performance The demand for business intelligence data only continues to increase, putting BI vendors

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Einsatzfelder von IBM PureData Systems und Ihre Vorteile. Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics

More information

Real Life Performance of In-Memory Database Systems for BI

Real Life Performance of In-Memory Database Systems for BI D1 Solutions AG a Netcetera Company Real Life Performance of In-Memory Database Systems for BI 10th European TDWI Conference Munich, June 2010 10th European TDWI Conference Munich, June 2010 Authors: Dr.

More information

The Ultimate Guide to Buying Business Analytics

The Ultimate Guide to Buying Business Analytics The Ultimate Guide to Buying Business Analytics How to Evaluate a BI Solution for Your Small or Medium Sized Business: What Questions to Ask and What to Look For Copyright 2012 Pentaho Corporation. Redistribution

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

IBM SPSS Modeler 14.2 In-Database Mining Guide

IBM SPSS Modeler 14.2 In-Database Mining Guide IBM SPSS Modeler 14.2 In-Database Mining Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 197. This edition applies to IBM SPSS Modeler

More information

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014 Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4

More information

Ten Things You Need to Know About Data Virtualization

Ten Things You Need to Know About Data Virtualization White Paper Ten Things You Need to Know About Data Virtualization What is Data Virtualization? Data virtualization is an agile data integration method that simplifies information access. Data virtualization

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Realizing the True Potential of Software-Defined Storage

Realizing the True Potential of Software-Defined Storage Realizing the True Potential of Software-Defined Storage Who should read this paper Technology leaders, architects, and application owners who are looking at transforming their organization s storage infrastructure

More information

The Ultimate Guide to Buying Business Analytics

The Ultimate Guide to Buying Business Analytics The Ultimate Guide to Buying Business Analytics How to Evaluate a BI Solution for Your Small or Medium Sized Business: What Questions to Ask and What to Look For Copyright 2012 Pentaho Corporation. Redistribution

More information