IHBI@CMU Leveraging Big Data
IHBI@CMU: Snapshot Not-for-profit consulting and applied research organization with more than a decade of experience Purpose Build, foster and promote the use of advanced and predictive analytics, and big data for competitive advantage in US-based organizations, across multiple industries Train the next generation of data scientists Mature team PhD and MS level resources physics, statistics, economics, computer science SAS Certified with extensive SAS professional training Strengths: predictive modeling, time series forecasting, machine learning, development mentoring Exceptional infrastructure Analytics Insight Lab Greenplum DCA MPP environment SAS, ESRI, Tableau, Model Factory (POC pending) Founded as Central Michigan University Research Corporation (CMU-RC) in 2001 by a group that included David Kepler, Dow Chemical Corporation; IBM Watson; Dow Corning and others. 2
Analytic Advantage Analy&cally impaired What happened? Repor(ng What ac3on are needed? Queries/drill down What if these trends con3nue? Alerts Sta(s(cal Analysis What is the best that can happen? Forecas(ng Predic(ve Modeling Op(miza(on Where is the problem? What will happen next? Why is this happening? Adapted from Compe&ng on Analy&cs: The New Science of Winning (Davenport, 2007).
Business Expertise/Services Forecasting Predictive warranty Customer loyalty Early warning Market segmentation Price optimization Site location New customer identification Work force predictive modeling Website monitoring Customer intelligence Text/unstructured data mining 4
Sample of Methods Data Mining and Modeling Decision Trees Forecas(ng Neural Networks Regression Op(miza(on / Simula(on Systems Dynamics Agent- based modeling Discrete- event simula(on Op(miza(on with uncertain data 5
Customers and Partners Manufacturing The Dow Chemical Company The Dow Corning Corporation Ford Motor Company General Motors Harley-Davidson Monsanto Steelcase Whirlpool Corporation Technology IBM Information Builders SAS Institute Hewlett-Packard Banking, Finance, Insurance Auto-Owners Insurance Comerica Bank Health and Healthcare Central Michigan District Health Dept. College of Health Professions, CMU College of Medicine, CMU Eli Lilly Henry Ford Health System Michigan Health Information Alliance Michigan Health Information Network Partners Healthcare (Boston) Spectrum Health System Synergy Medical Other Proctor and Gamble DTE Energy Domino's Pizza Gordon Food Service State of Michigan 6
Services Innovation Workshop A structured process to identify high-value analytics opportunities. Exploratory Data Analysis A statistical approach to evaluating the relative strengths and weakness of data to be used for a specific purpose. Analytics Proof-of-Concept -- Custom projects, usually involving a series of complex models, designed to answer specific questions. Analytics Staff Augmentation Get the right help when you need it. 7
8 A PERFECT STORM
Ambitious Question Business Challenge: Dramatically increase the ability to predict demand for products and services by customer segment customer segment (age, race gender) geographic region (zip code/census tract) over time (3, 5, 10 years in the future) major product type
Ambitious Data (External) Scope Census, Bureau of Labor Statistics, NOAA, American Community Survey, and more. Time 10 years of history 10 years of future forward projections Space Zip code and census tract **18 billion rows of population data**
Ambitious Modeling 300 models Driven by the customer segments Machine learning approach Artificial neural network Highly accurate
The Pain Point Data Size Terabytes Loading the data 7-10 days to load Looking at the data Traversing a table -- hours Testing a model 3 weeks, 24/7, new equipment
Our Solution EMC/Greenplum Data Computing Appliance Quarter rack Scalable Performance to date FAST (POC in process) Loading went from days to minutes Generate a 100K row sample -- hours to minutes Sample queries -- 24 minutes to 1 minute (400 million row result set) Training the model --?? Scoring the data --??
OPPORTUNITY FOR CMU
Better Answers Forecasting Predictive warranty Customer loyalty Early warning Market segmentation Price optimization Site location New customer identification Work force predictive modeling Website monitoring Customer intelligence Text/unstructured data mining 15
Big Data/Analytics Sandbox The Analytics Insight Lab (A-LAB) A secure platform to leverage data and solve real-world problems/challenges. Provides a low-risk way to get started Leading edge data visualization, Integration of proprietary and public data, Advanced mining of structured and unstructured data, including social media 16
Big Data/Analytics Sandbox Technical Environment EMC/Greenplum DCA Remote access provided through virtual machines SAS, ESRI, Tableau Contextual Database 18 billion rows, demographic, socioeconomic, 20 years of data at the census tract Problem Solving/Modeling Support As much or as little as you need Subject Matter Experts for Hire Faculty available 17
Summary Big data and high performance computing offer new opportunity Improve current data-driven problem solving Solve completely new problems This is not a fad, but a fundamental shift in how successful organizations will compete for market share and organizational effectiveness.
Contact Opportunity by Postal Code Tracy Irwin Hewitt Associate Director 734-837-0279 tracy.hewitt@cmich.edu Opportunity; 2015 High Low This map shows the 'Opportunity Forecast' for Michigan at the Postal Code level. Each mark represents a Postal Code point. Map Produced by: The Institute for Health & Business Insight, Central Michigan University, Oct. 1 2012 19
20 TECHNICAL ENVIRONMENT
ICEBOX Technical Specs Hosts: 2x Dell R715 servers with Dual AMD Opteron 6136 processors and 96GB RAM. VMware ESX 4.1 Greenplum DCA: 4 Greenplum Database Modules Storage: 2 Xio ISE1 FC 49.6 TB total 1 Drobo 16 TB 1 Synology 16 TB 3 EMC Isilon X200 105TB total (coming soon) Networking: Private network accessible via VMware View Client
Selected Software Greenplum Database v4.2.2.4 SAS SAS 9.3 SAS Enterprise Guide 5.1 SAS Enterprise Miner 12.1 JMP 10 Tableau 7 ESRI Arc Suite 10.1