IHBI@CMU Leveraging Big Data



Similar documents
Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Advanced In-Database Analytics

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

SAS Academic Program

BIG DATA-AS-A-SERVICE

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Hadoop Market - Global Industry Analysis, Size, Share, Growth, Trends, and Forecast,

I. Justification and Program Goals

VIEWPOINT. High Performance Analytics. Industry Context and Trends

EMC Backup and Recovery for Microsoft SQL Server

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?

EMC Backup and Recovery for Microsoft SQL Server

An Introduction to Advanced Analytics and Data Mining

CONNECTING DATA WITH BUSINESS

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Big Data + Big Analytics Transforming the way you do business

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

HP ProLiant DL585 G5 earns #1 virtualization performance record on VMmark Benchmark

Armanino McKenna LLP Welcomes You To Today s Webinar:

EMC BACKUP MEETS BIG DATA

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

The SAS Analytics Shootout Annual Student Competition

Integrated Grid Solutions. and Greenplum

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA)

Big Data and the Data Lake. February 2015

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA)

Main Memory Data Warehouses

Data Science And Big Data Analytics Course

EMC Greenplum Data Computing Appliance Enhances EMC IT s Global Data Warehouse

BUSINESS INTELLIGENCE COMPETENCY CENTER

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Technology and Trends for Smarter Business Analytics

Data-Driven Decisions: Role of Operations Research in Business Analytics

2015 Workshops for Professors

The Lab and The Factory

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

How To Understand The Business Case For Big Data

The 3 questions to ask yourself about BIG DATA

Better planning and forecasting with IBM Predictive Analytics

EMC STRATEGY Journey to Cloud -Big Data

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

BIG DATA. - How big data transforms our world. Kim Escherich Executive Innovation Architect, IBM Global Business Services

The Business Analyst s Guide to Hadoop

Predictive Marketing. Digital Intelligence. Igniting Customer Engagement.

Universal PMML Plug-in for EMC Greenplum Database

Machina Research. Where is the value in IoT? IoT data and analytics may have an answer. Emil Berthelsen, Principal Analyst April 28, 2016

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Oracle Big Data Building A Big Data Management System

SQream Technologies Ltd - Confiden7al

Harnessing the power of advanced analytics with IBM Netezza

W o r l d w i d e B u s i n e s s A n a l y t i c s S o f t w a r e F o r e c a s t a n d V e n d o r S h a r e s

Global Cloud Analytics Market

Big Data and Its Impact on the Data Warehousing Architecture

HIGH PERFORMANCE ANALYTICS FOR TERADATA

Business Intelligence. Advanced visualization. Reporting & dashboards. Mobile BI. Packaged BI

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

ANALYTICS CENTER LEARNING PROGRAM

IRMAC SAS INFORMATION MANAGEMENT, TRANSFORMING AN ANALYTICS CULTURE. Copyright 2012, SAS Institute Inc. All rights reserved.

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Introducing Oracle Exalytics In-Memory Machine

Disrupt or be disrupted IT Driving Business Transformation

Traditional Analytics and Beyond:

Global Big Data Market: Trends & Opportunities ( ) June 2015

The New Landscape of Business Intelligence & Analytics New Opportunities, Roles and Outcomes. Summit 2015 Orlando London Frankfurt Madrid Mexico City

SAP HANA - an inflection point

The State of Analytics Maturity for Healthcare Providers

Global Predictive Analytics Market: Research Report

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

IBM SPSS Direct Marketing

Big Data Executive Survey

SAS and Teradata Partnership

Customers award top satisfaction scores to IBM System x x86 servers. August 2014 TBR T EC H N O LO G Y B U S I N ES S R ES EAR C H, I N C.

EMC Business Continuity for Microsoft SQL Server 2008

The Big Data Market: Business Case, Market Analysis & Forecasts

An interdisciplinary model for analytics education

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Customer Management TRIAD version 2.0

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

The Internet of Things

Transcription:

IHBI@CMU Leveraging Big Data

IHBI@CMU: Snapshot Not-for-profit consulting and applied research organization with more than a decade of experience Purpose Build, foster and promote the use of advanced and predictive analytics, and big data for competitive advantage in US-based organizations, across multiple industries Train the next generation of data scientists Mature team PhD and MS level resources physics, statistics, economics, computer science SAS Certified with extensive SAS professional training Strengths: predictive modeling, time series forecasting, machine learning, development mentoring Exceptional infrastructure Analytics Insight Lab Greenplum DCA MPP environment SAS, ESRI, Tableau, Model Factory (POC pending) Founded as Central Michigan University Research Corporation (CMU-RC) in 2001 by a group that included David Kepler, Dow Chemical Corporation; IBM Watson; Dow Corning and others. 2

Analytic Advantage Analy&cally impaired What happened? Repor(ng What ac3on are needed? Queries/drill down What if these trends con3nue? Alerts Sta(s(cal Analysis What is the best that can happen? Forecas(ng Predic(ve Modeling Op(miza(on Where is the problem? What will happen next? Why is this happening? Adapted from Compe&ng on Analy&cs: The New Science of Winning (Davenport, 2007).

Business Expertise/Services Forecasting Predictive warranty Customer loyalty Early warning Market segmentation Price optimization Site location New customer identification Work force predictive modeling Website monitoring Customer intelligence Text/unstructured data mining 4

Sample of Methods Data Mining and Modeling Decision Trees Forecas(ng Neural Networks Regression Op(miza(on / Simula(on Systems Dynamics Agent- based modeling Discrete- event simula(on Op(miza(on with uncertain data 5

Customers and Partners Manufacturing The Dow Chemical Company The Dow Corning Corporation Ford Motor Company General Motors Harley-Davidson Monsanto Steelcase Whirlpool Corporation Technology IBM Information Builders SAS Institute Hewlett-Packard Banking, Finance, Insurance Auto-Owners Insurance Comerica Bank Health and Healthcare Central Michigan District Health Dept. College of Health Professions, CMU College of Medicine, CMU Eli Lilly Henry Ford Health System Michigan Health Information Alliance Michigan Health Information Network Partners Healthcare (Boston) Spectrum Health System Synergy Medical Other Proctor and Gamble DTE Energy Domino's Pizza Gordon Food Service State of Michigan 6

Services Innovation Workshop A structured process to identify high-value analytics opportunities. Exploratory Data Analysis A statistical approach to evaluating the relative strengths and weakness of data to be used for a specific purpose. Analytics Proof-of-Concept -- Custom projects, usually involving a series of complex models, designed to answer specific questions. Analytics Staff Augmentation Get the right help when you need it. 7

8 A PERFECT STORM

Ambitious Question Business Challenge: Dramatically increase the ability to predict demand for products and services by customer segment customer segment (age, race gender) geographic region (zip code/census tract) over time (3, 5, 10 years in the future) major product type

Ambitious Data (External) Scope Census, Bureau of Labor Statistics, NOAA, American Community Survey, and more. Time 10 years of history 10 years of future forward projections Space Zip code and census tract **18 billion rows of population data**

Ambitious Modeling 300 models Driven by the customer segments Machine learning approach Artificial neural network Highly accurate

The Pain Point Data Size Terabytes Loading the data 7-10 days to load Looking at the data Traversing a table -- hours Testing a model 3 weeks, 24/7, new equipment

Our Solution EMC/Greenplum Data Computing Appliance Quarter rack Scalable Performance to date FAST (POC in process) Loading went from days to minutes Generate a 100K row sample -- hours to minutes Sample queries -- 24 minutes to 1 minute (400 million row result set) Training the model --?? Scoring the data --??

OPPORTUNITY FOR CMU

Better Answers Forecasting Predictive warranty Customer loyalty Early warning Market segmentation Price optimization Site location New customer identification Work force predictive modeling Website monitoring Customer intelligence Text/unstructured data mining 15

Big Data/Analytics Sandbox The Analytics Insight Lab (A-LAB) A secure platform to leverage data and solve real-world problems/challenges. Provides a low-risk way to get started Leading edge data visualization, Integration of proprietary and public data, Advanced mining of structured and unstructured data, including social media 16

Big Data/Analytics Sandbox Technical Environment EMC/Greenplum DCA Remote access provided through virtual machines SAS, ESRI, Tableau Contextual Database 18 billion rows, demographic, socioeconomic, 20 years of data at the census tract Problem Solving/Modeling Support As much or as little as you need Subject Matter Experts for Hire Faculty available 17

Summary Big data and high performance computing offer new opportunity Improve current data-driven problem solving Solve completely new problems This is not a fad, but a fundamental shift in how successful organizations will compete for market share and organizational effectiveness.

Contact Opportunity by Postal Code Tracy Irwin Hewitt Associate Director 734-837-0279 tracy.hewitt@cmich.edu Opportunity; 2015 High Low This map shows the 'Opportunity Forecast' for Michigan at the Postal Code level. Each mark represents a Postal Code point. Map Produced by: The Institute for Health & Business Insight, Central Michigan University, Oct. 1 2012 19

20 TECHNICAL ENVIRONMENT

ICEBOX Technical Specs Hosts: 2x Dell R715 servers with Dual AMD Opteron 6136 processors and 96GB RAM. VMware ESX 4.1 Greenplum DCA: 4 Greenplum Database Modules Storage: 2 Xio ISE1 FC 49.6 TB total 1 Drobo 16 TB 1 Synology 16 TB 3 EMC Isilon X200 105TB total (coming soon) Networking: Private network accessible via VMware View Client

Selected Software Greenplum Database v4.2.2.4 SAS SAS 9.3 SAS Enterprise Guide 5.1 SAS Enterprise Miner 12.1 JMP 10 Tableau 7 ESRI Arc Suite 10.1