How is a rational (big) data deployment approach like optimizing the generation mix of a power company?



Similar documents
HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Big Data - Infrastructure Considerations

Accenture Technology Labs Hadoop Deployment Comparison Study

Luncheon Webinar Series May 13, 2013

Big Data Analytics - Accelerated. stream-horizon.com

Winning Against All Odds: Big Data for the Budget Travel Industry. Silviu Preoteasa Head of Marketing Technology

Big Data Analytics - Accelerated. stream-horizon.com

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Dell Reference Configuration for Hortonworks Data Platform

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

Ground up Introduction to In-Memory Data (Grids)

Enabling the Use of Data

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Performance and Energy Efficiency of. Hadoop deployment models

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Cloud-based Hadoop Deployments: Benefits and Considerations

Opportunities with Predictive Analytics. Greg Leflar, Vice President

Performance Management in Big Data Applica6ons. Michael Kopp, Technology

HadoopTM Analytics DDN

Choosing Storage Systems

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Hadoop Size does Hadoop Summit 2013

The Future of Data Management

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Hadoop & Spark Using Amazon EMR

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen

Modern Data Architecture for Predictive Analytics

Oracle SOA Infrastructure Deployment Models/Patterns

Big Data Performance Growth on the Rise

Big Data in the Nordics 2012

SAP and Hortonworks Reference Architecture

Presenters: Luke Dougherty & Steve Crabb

Big Data Management in the Clouds and HPC Systems

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Improve performance and availability of Banking Portal with HADOOP

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Scaling Your Data to the Cloud

Introduction to Hadoop

CA Big Data Management: It s here, but what can it do for your business?

White Paper Storage for Big Data and Analytics Challenges

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Big Analytics in the Cloud. Matt Winkler PM, Big

A Brief Introduction to Apache Tez

NextGen Infrastructure for Big DATA Analytics.

Industry 4.0 and Big Data

High Performance Computing OpenStack Options. September 22, 2015

Dell* In-Memory Appliance for Cloudera* Enterprise

MakeMyTrip CUSTOMER SUCCESS STORY

Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

HP SiteScope. Hadoop Cluster Monitoring Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems

Big Data With Hadoop

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Linux Performance Optimizations for Big Data Environments

Dominik Wagenknecht Accenture

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

Scaling Database Performance in Azure

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Oracle on Oracle. Hans Peter Kipfer Vice President, Engineered Systems EMEA

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

Scalable Architecture on Amazon AWS Cloud

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Advanced In-Database Analytics

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Performance Testing of Big Data Applications

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Big Data Analytics Nokia

DataStax Enterprise, powered by Apache Cassandra (TM)

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

SAS Analytics on IBM FlashSystem storage: Deployment scenarios and best practices

Ganzheitliches Datenmanagement

Manjrasoft Market Oriented Cloud Computing Platform

Azure Data Lake Analytics

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Open Source for Cloud Infrastructure

Transcription:

How is a rational (big) data deployment approach like optimizing the generation mix of a power company? John Akred & Stephen O Sullivan @SVDataScience

John Akred @BigDataAnalysis Puppy Daddy PIF Husband Insider Trading (Investigation) VIX Index SPSS Clementine & Text Analysis for Surveys Condition Monitoring HR Analytics Smart Grid 2

Stephen Osullivan @steveos Father of 2 Husband Database Engineer DBA Geek Quartermaster Q 3

Utility Source Mix Variable Load Base Load 180 160 140 120 100 80 60 40 20 0 12:00 AM 1:00 AM 2:00 AM 3:00 AM 4:00 AM 5:00 AM 6:00 AM 7:00 AM 8:00 AM 9:00 AM 10:00 AM 11:00 AM 12:00 PM 1:00 PM 2:00 PM 3:00 PM 4:00 PM 5:00 PM 6:00 PM 7:00 PM 8:00 PM 9:00 PM 10:00 PM 11:00 PM Other Customer Solar Customer Wind UBlity Solar UBlity BaEery UBlity Wind Nuclear Fossil 4

PREVIOUSLY WE ASKED: What is the data type? What is the size of the data? What are the indexes? What are the foreign key constraints? NOW WE ASK: Confidential And Proprietary. 5

TOTAL COST OF OWNERSHIP Bare Metal vs. Cloud Smackdown This result debunks the idea that the cloud is not suitable for Hadoop MapReduce workloads given their heavy I/O requirements. Hadoop-as-a- Service provides a better priceperformance ratio than the bare-metal counterpart. http://www.accenture.com/sitecollectiondocuments/pdf/accenture-hadoop-deployment-comparison-study.pdf 6

We consistently observed that the performance improvement by applying various tuning techniques made a huge impact. TUNING MATTERS! Sessionization Recommendation 21h50m 9h11m First round of tuning Second round of tuning Document Clustering memory space per CPU core impacts the maximum pertask heap space allocation that sessionization requires http://www.accenture.com/sitecollectiondocuments/pdf/accenture-hadoop-deployment-comparison-study.pdf 7

Cloud Specific Cluster- Wide Memory space per CPU core Number of map task slots and reduce task slots per TaskTracker node TUNING PARAMETERS Number of CPU cores Storage per node Task resource allocations 8

Log Collection & Search APPLICATION SERVERS Logs Logs statsd Log Search http http 9

Real-Time Sales Transactions APPLICATION SERVERS DATA CENTER A DATA CENTER B Hue Hue hep hep 10

Identity Matching EDW Push results to Cassandra 11

Sessionization Clickstream 12

POLYGLOT PERSISTENCE Horizontal State Operations Supply Chain Low Latency Event Detection Planning Asset Management Vertical Historical High Latency Predictive Modeling Confidential And Proprietary. 13

METHODOLOGY We iterate to value, answering the most valuable questions as quickly as possible Plan Prove Pilot Production Plan Prove Pilot Production Confidential And Proprietary. 14

FROM EXPERIMENT TO DEPLOYMENT Pilot Workload Optimize Workload Analyze Production Requirements Determine TCO of Deployment Approaches Provision Deployment Confidential And Proprietary. 15

Utility Source Mix Variable Load Base Load 180 160 140 120 100 80 60 40 20 0 12:00 AM 1:00 AM 2:00 AM 3:00 AM 4:00 AM 5:00 AM 6:00 AM 7:00 AM 8:00 AM 9:00 AM 10:00 AM 11:00 AM 12:00 PM 1:00 PM 2:00 PM 3:00 PM 4:00 PM 5:00 PM 6:00 PM 7:00 PM 8:00 PM 9:00 PM 10:00 PM 11:00 PM Other Customer Solar Customer Wind UBlity Solar UBlity BaEery UBlity Wind Nuclear Fossil 16

? questions Yes, We re Hiring svds.com 17

THANK YOU John @BigDataAnalysis Stephen @steveos Slides are here: http://www.svds.com/downloads 18