1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation using HP Distributed R and Jorge Ahumada, Executive Director, TEAM Network, Conservation International Sunil Venkayala, Senior Technical Product Manager, HP Big Data software Aug 11th, 2015
Simplify operationalizing predictive analytics 3 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data science: Process flow Understand problem Operationalize and monitor Explore data assets Model and evaluate Prepare data 4 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data scientist: Bridging the business and IT gap Business Achieve competitive advantage with predictive and prescriptive insights from data Data scientist Build and deploy actionable analytic solutions to meet business goals IT Ensure system architecture, data management and security 5 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Challenges: Operationalizing predictive analytics 6 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Challenges: Increase in data sizes/types for predictive analytics Source: TDWI Research, Predictive Analytics for Business Advantage, 2014 Visit tdwi.org/bpreports for more information 7 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data integration: Shared-nothing massively parallel processing columnar database Volume Velocity Variety Bulk load to Memory Trickle load Flex Zone Bulk load to Disk Kafka* IDOL CFS 35 TB SLA at Facebook Low-latency in seconds Structured, Semi and Unstructured 8 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data preparation: and vertica.dplyr* Fast query performance with comprehensive SQL support Data Cleaning Missing value handling Outlier handling Binning and Normalization Filtering and Aggregations Derive new features Filter irrelevant cases Join multiple data sets Data Management User management Data Security High availability 9 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Build predictive models: HP Distributed R Open-source scalable distributed computing platform native to R Native to R Distributed Data Structures Out-of-box parallel algorithms Classification Random Forest, Logistic Regression Vertica Integration Native parallel data connector Distributed Computing API Regression Linear Regression vertica.dplyr Open-source with HP Support Clustering K-Means HPData 10 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Operationalize predictive models: and Distributed R Deploy R models In-database scoring BI/Application Integration Have Predictive models near Data Model management in high-available storage Model metadata management Out Of Box SQL predict functions Low memory footprint with high-scalability Data security Visual Predictive Insights partner ecosystem Embedded analytics using JDBC/ODBC 11 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Haven Predictive Analytics Delivering scale and performance with Distributed R breakthrough technology 1 2 Build models 1 Ingest and prepare data by leveraging 2 Build and evaluate predictive models on large data sets using Distributed R 3 BI integration Deploy models (In-database scoring) Evaluate models HP powered clustered computing 3 Deploy models to Vertica and use in-database scoring to produce prediction results for BI and applications A scalable, high-performance engine for the R language developed by HP Labs Natively integration to Compatible with popular tools like R Studio and existing R libraries Open source supported by HP with enterprise-class support 12 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Forward-looking statements This is a rolling (up to three year) Roadmap and is subject to change without notice. This document contains forward looking statements regarding future operations, product development, product capabilities and availability dates. This information is subject to substantial uncertainties and is subject to change at any time without prior notification. Statements contained in this document concerning these matters only reflect Hewlett Packard's predictions and / or expectations as of the date of this document and actual results and future plans of Hewlett-Packard may differ significantly as a result of, among other things, changes in product strategy resulting from technological, internal corporate, market and other changes. This is not a commitment to deliver any material, code or functionality and should not be relied upon in making purchasing decisions. 13 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP confidential information This Roadmap contains HP Confidential Information. If you have a valid Confidential Disclosure Agreement with HP, disclosure of the Roadmap is subject to that CDA. If not, it is subject to the following terms: for a period of 3 years after the date of disclosure, you may use the Roadmap solely for the purpose of evaluating purchase decisions from HP and use a reasonable standard of care to prevent disclosures. You will not disclose the contents of the Roadmap to any third party unless it becomes publically known, rightfully received by you from a third party without duty of confidentiality, or disclosed with HP s prior written approval. 14 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
This is a rolling (up to 3 year) roadmap and is subject to change without notice New distributed algorithms Generalized boosting models Ensemble modeling for robust prediction accuracy Pattern mining (association rules) Cross-sell or up-sell based on buying patterns Attribute/features importance (randomforest) Identify important attributes in high-dimensional data Decision trees Discover business rules from data 15 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
This is a rolling (up to 3 year) roadmap and is subject to change without notice Co-location of Distributed R in Vertica Node 16 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
This is a rolling (up to 3 year) roadmap and is subject to change without notice HDFS direct data connectors CSV and ORC HDFS (csv, orc) HDFS (csv, orc) HDFS (csv, orc) HDFS (csv, orc) 17 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
This is a rolling (up to 3 year) roadmap and is subject to change without notice Co-location of Distributed R in Hadoop Node 18 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
This is a rolling (up to 3 year) roadmap and is subject to change without notice Native Vertica connector to Spark RDD, Data frames 19 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
This is a rolling (up to 3 year) roadmap and is subject to change without notice Deploy Spark Models In-database UDx 20 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
This is a rolling (up to 3 year) roadmap and is subject to change without notice HP Haven Predictive Analytics Apache Spark MLLib Native Connectors to 1 2 Build models 1 Ingest and prepare data by leveraging 2 Build and evaluate predictive models on large data sets using Spark MLLib 3 BI integration Deploy models (In-database scoring) Evaluate models 3 Deploy models to Vertica and use in-database scoring to produce prediction results for BI and applications 21 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.