Data Science with Hadoop at Opower



Similar documents
Turning off lights, lowering the COMMUNITY JAVA IN ACTION

Sentimental Analysis using Hadoop Phase 2: Week 2

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

How To Handle Big Data With A Data Scientist

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

Databricks. A Primer

MICRO HYDRO FOR THE FARM AND HOME

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Integrating a Big Data Platform into Government:

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Building a data analytics platform with Hadoop, Python and R

Databricks. A Primer

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

What s Cooking in KNIME

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Big Data Infrastructure at Spotify

Microsoft Research Windows Azure for Research Training

HDFS. Hadoop Distributed File System

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Using the AVR microcontroller based web server

Big Data Scenario mit Power BI vs. SAP HANA Gerhard Brückl

Data processing goes big

Workshop on Hadoop with Big Data

Microsoft Research Microsoft Azure for Research Training

Hybrid (Dual Fuel) - Gas Heat and Air Source Heat Pump

Building Your Big Data Team

Shark Installation Guide Week 3 Report. Ankush Arora

Creating Connection with Hive

Spatio-Temporal Networks:

HiBench Introduction. Carson Wang Software & Services Group

Advanced Big Data Analytics with R and Hadoop

10605 BigML Assignment 4(a): Naive Bayes using Hadoop Streaming

An interdisciplinary model for analytics education

Water Heating and Heat Recovery 1

Attached is a link to student educational curriculum on plug load that provides additional information and an example of how to estimate plug load:

ANALYTICS CENTER LEARNING PROGRAM

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Assignment 2: More MapReduce with Hadoop

Implement Hadoop jobs to extract business value from large and varied data sets

Big Data and Market Surveillance. April 28, 2014

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Questionnaire about the skills necessary for people. working with Big Data in the Statistical Organisations

CDH AND BUSINESS CONTINUITY:

Big Data on Microsoft Platform

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Making big data simple with Databricks

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

Car Insurance. Prvák, Tomi, Havri

How To Learn To Use Big Data

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Big Data Explained. An introduction to Big Data Science.

Customer Case Study. Sharethrough

The Internet of Things and Big Data: Intro

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Data Modeling for Big Data

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

Modernizing Your Data Warehouse for Hadoop

Oracle Data Miner (Extension of SQL Developer 4.0)

Heat Trace Fundamentals. Monte Vander Velde, P.E. President, Interstates Instrumentation

Learn how to store and analyze Big Data Learn about the cloud and its services for Big Data

Putting Apache Kafka to Use!

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

HEAT PUMP FREQUENTLY ASKED QUESTIONS HEAT PUMP OUTDOOR UNIT ICED-UP DURING COLD WEATHER:

Massive Cloud Auditing using Data Mining on Hadoop

Comfort you can count on.

COURSE SYLLABUS COURSE TITLE:

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

Big Data and Data Science. The globally recognised training program

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Heating and cooling. Insulation. Quick facts on insulation:

Energy'Saving,'Thermal'Comfort'and'Solar'Power'Information'Sheet'

Creating the Ideal Environment for Accelerated Drying Times on Construction and Restoration Projects

Creating a universe on Hive with Hortonworks HDP 2.0

SEERS International. Hybrid Hot Water Systems. Innovation Technology

Beginner s Guide to. BigDataAnalytics

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Processing millions of logs with Logstash

Introduction To Hive

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Machine Data Analytics with Sumo Logic

Accessing Information from your Electronic Health Records applications: Tools and Techniques


Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Cloudera Manager Training: Hands-On Exercises

Transcription:

Data Science with Hadoop at Opower Erik Shilts Advanced Analytics erik.shilts@opower.com

What is Opower?

A study:

$$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off AC & Turn on Fan

$$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off AC & Turn on Fan Zero Impact on Consumption

$$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off AC & Turn on Fan Zero Impact on Consumption

$$$ Environment Citizenship Neighbors Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Zero Impact on Consumption

$$$ Environment Citizenship Neighbors Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Zero Impact on Consumption 6% Drop in Consumption

Opower Details Customer Engagement Platform for Utilities Company ~300 employees Cleantech Company of the Year 2012! 75 utility partners covering > 50M households > 1.5 Terawatt hours saved Our DNA Data analytics Behavioral science

What is Opower?

What is Opower? One giant big problem

Advanced Analytics

Advanced Analytics provides consumer insights Our charter is to provide consumers with insights that give context and control over how they use energy. 13

We use machine learning and predictive modeling Our charter is to provide consumers with insights that give context and control over how they use energy. Use machine learning, signal processing, and predictive modeling to provide energy usage insights. 14

We provide insights into individual energy use Cooling Heating Baseload Jan Apr Jul Oct Jan Apr Jul Oct

Data science

Data scientists extract meaning Data science is a discipline with the goal of extracting meaning from and creating products. Wikipedia: http://en.wikipedia.org/wiki/data_science 17

Data scientists are statisticians Data science is a discipline with the goal of extracting meaning from and creating products. In other words, machine learning, statistics, and pretty charts. Wikipedia: http://en.wikipedia.org/wiki/data_science 18

Data scientists want to extract meaning Data science is a discipline with the goal of extracting meaning from and creating products. In other words, machine learning, statistics, and pretty charts. Wikipedia: http://en.wikipedia.org/wiki/data_science 19

Data scientists are mungers Data science is a discipline of munging. 20

Data scientists prepare Data science is a discipline of munging. Data munging is the process of converting from one form into another for more convenient consumption. Wikipedia: http://en.wikipedia.org/wiki/data_wrangling 21

Data scientists are plumbers Data science is a discipline of plumbing. Plumbing is difficult. 22

It s temporary, I swear! Data science is a discipline of plumbing. Move from here to there. Hack to get the how you want it. http://funmeme.com/post/2009/08/02/plumbing-fail-e28093-funny-pic.aspx 23

It works. For now. Data science is a discipline of plumbing. Multiple sources are tricky to handle. Construct a series of tubes. http://www.ontimeplumber.com.au/plumbing_disasters/plumbing_disasters.html 24

Needs user testing Data science is a discipline of plumbing. Sometimes you have to start over when you think you re done. http://www.funnyjunk.com/funny_pictures/234485/awkward/ 25

Data science is mostly plumbing Data science is a discipline of plumbing. 26

It s where we spend all of our time Data science is a discipline of plumbing. We spend 80% of our time on munging and other infrastructure work. 27

Fun stuff only 20% of the time Data science is a discipline of plumbing. We spend 80% of our time on munging and other infrastructure work. Sprinkle on some modeling and charts for the other 20%. 28

Data science in practice

Electric tankless water heater 10% off http://www.homedepot.com/plumbing-water-heaters-tankless-electric/h_d1/n-5yc1vzc1ty/r- 203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051 30

Who should get this promotion? http://www.homedepot.com/plumbing-water-heaters-tankless-electric/h_d1/n-5yc1vzc1ty/r- 203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051 31

Maximize take-up rate http://www.homedepot.com/plumbing-water-heaters-tankless-electric/h_d1/n-5yc1vzc1ty/r- 203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051 32

Minimize marketing cost http://www.homedepot.com/plumbing-water-heaters-tankless-electric/h_d1/n-5yc1vzc1ty/r- 203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051 33

Data science in practice Identify likely purchasers

Data science in the past How would we have solved this before Hadoop? 35

Past is same as the present: construct a model How would we have solved this before Hadoop? Construct a model of likely purchasers. 36

Predict purchase behavior with a model Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message We can model purchase behavior at the consumer level. Include predictors that indicate heavy winter electric usage, neighbor influences, and responsiveness to past communications. 37

Housing heat type correlates with water heat type Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Does the consumer use electric heat? Households with gas heat are unlikely to purchase an electric water heater. (Natural gas is cheap.) 38

Willingness to invest in efficient products Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Has the consumer participated in similar program promotions? Past purchase behavior is a good predictor of future behavior. 39

Neighbor effects can be powerful Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Is the product popular about their neighbors? Neighbor effects may influence purchase behavior. 40

Responsiveness proxies engagement Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Has the consumer responded to past communications? Past responsiveness indicates high engagement. 41

Home Energy Reports influence usage perceptions Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message What type of message has the consumer received on their Home Energy Reports? The relative positioning of past energy usage may influence willingness to invest in future lower usage. 42

We have a model. Let s get the. Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message 43

Disparate sources Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 44

Let s start plumbing Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 45

Pipe utility Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 46

Pipe customer interaction Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 47

Finally, pipe Home Energy Report Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 48

Now we re ready to model Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message 49

There s a problem Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message We know these predictors 50

Heat type is sparse and inaccurate Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message This is harder We know these predictors 51

Model electric heat to compensate for bad Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Parcel coverage of heat type is sparse and inaccurate. We need another source for heat type. 52

We construct a model to predict heat type Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price We can model the presence of electric heat. Include predictors of weather sensitivity, area prevalence, and local natural gas price. 53

Sensitivity of electricity usage to cold weather Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price How sensitive is the consumer s electricity usage to cold weather? High sensitivity to cold weather is our best indicator of electric heat. 54

Heat Type Is Related to Geography Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Is electric heat popular in the consumer s area? Heat type tends to have specific geographic distributions. 55

Gas Prices May Affect Heat Type Adoption Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price How expensive is the alternative? Natural gas may be hard to get in certain areas. 56

We have another model. Let s get the. Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price 57

Our plumbing so far Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 58

Pipe neighbor heat type Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 59

Pipe natural gas prices Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 60

Now we re ready to model (x2) Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price 61

There s a problem (x2) Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price We know these predictors 62

We don t know weather sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price This is harder We know these predictors 63

Luckily, we know how to do this Cooling Heating Baseload Jan Apr Jul Oct Jan Apr Jul Oct

We have a disaggregation algorithm. Let s get the. Cooling Heating Baseload Jan Apr Jul Oct Jan Apr Jul Oct

Disaggregate heating and cooling Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Correlate electricity usage with weather. Let s grab the. Jan Apr Jul Oct Jan Apr Jul Oct 66

Our plumbing so far (x2) Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 67

Pipe electricity usage Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 68

Pipe thermostat Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 69

Pipe weather Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 70

Starting to feel like Inception http://www.chartgeek.com/wp-content/uploads/2012/04/inception-explained-chart.jpg 71

Now we re ready to model (finally) Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Jan Apr Jul Oct Jan Apr Jul Oct Construct disaggregation algorithms. Calculate sensitivity for all households. 72

Disaggregate and store results Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 73

We know each customer s heating sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Let s continue with our electric heat model. Jan Apr Jul Oct Jan Apr Jul Oct 74

We have the to finish our heat type model Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Construct electric heat model. Impute heat type for all households. 75

Impute heat type and store results Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 76

We know each customer s heat type Probability(purchase) = Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Let s continue with our water heater purchase model. 77

We now have the to finish our purchase model Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Construct purchase behavior model. Calculate likelihood to purchase for all households. 78

Calculate likelihood to purchase and store results Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 79

We have our desired result Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 80

Data science is plumbing Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams http://www.ontimeplumber.com.au/plumbing_disasters/plumbing_disasters.html 81

New request: Who would buy an efficient pool pump for 10% off? http://www.amazon.com/pentair-intelliflo-variable-230-volt-16-ampere/dp/b007e4vwno/ref=sr_1_3?ie=utf8&qid=1350601695&sr=8-3&keywords=variable+speed+pool+pump 82

I remember what the last model took Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 83

and I start searching the want-ads http://careers.stackoverflow.com/jobs?searchterm=+scientist&location= 84

But it gets better Now we have Hadoop! http://careers.stackoverflow.com/jobs?searchterm=+scientist&location= 85

Past is same as the present: construct a model How would we have solved this with Hadoop? Construct a model of likely purchasers. 86

Hadoop has a key advantage How would we have solved this with Hadoop? Construct a model of likely purchasers. Integrated warehousing and crunching 87

Data and analytical capabilities in a single place Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Algorithm s 88

Hadoop solves plumbing problem Utility usage Thermostat Weather Customer interaction history Additional streams All the Hadoop Cluster 4 4 3 1 2 3 1 2 Algorithm s 89

Fully integrated crunching Utility usage Thermostat Weather Customer interaction history Additional streams All the Hadoop Cluster 4 4 3 1 2 3 1 2 Analytical capabilities Algorithm s 90

Our model is the same. Let s start building it. Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message 91

Still need weather sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Calculating sensitivity is much easier with Hadoop. Let s get the. Jan Apr Jul Oct Jan Apr Jul Oct 92

Fetch your with Hive views Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s 93

Views provide fresh on demand Hive is a SQL-like interface to Hadoop. Hive views are saved queries that you treat like a table. Build views on top of views to setup complex analyses. Querying a view takes longer to execute, but it ensures fresh. 94

View syntax is plain SQL CREATE VIEW analytics.disaggregation_inputs_view AS SELECT w.temperature, r.usage_value FROM analytics.weather w JOIN analytics.reads r on w.zip_code = r.zip_code ; 95

Views are on demand Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s More without the storage 96

Views point at without storing it Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s More without the storage 97

Views on top of views for complex analyses Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s More without the storage 98

Setup a view to get disaggregation Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s 99

We have our disaggregation Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = We need to calculate the model and store the results. Hadoop is built to do both. Jan Apr Jul Oct Jan Apr Jul Oct 100

Setup a view to run disaggregation algorithms Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s 101

Hadoop streaming + Views = Power Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 3 1 2 Use Hadoop 4 streaming within the view 3 1 2 Views Algorithm s 102

Hadoop streaming can calculate anything Stream through any script. Pipe any through standard input and send any to standard output. Integrate with any language: R, Python, Ruby, Bash, Java, etc. SELECT TRANSFORM command in Hive is an easy way to use Hadoop streaming. 103

Hadoop streaming is easy to implement in Hive CREATE VIEW analytics.disaggregation_outputs_view AS SELECT Executable reads from stdin and writes to stdout TRANSFORM ( diw.temperature, diw.usage_value ) USING weather_disaggregation.r FROM analytics.disaggregation_inputs_view diw ; 104

Simple SQL syntax to produce any result Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 Algorithm in any language 3 1 2 3 1 2 Views Algorithm s 105

We know each customer s heating sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Let s continue with our electric heat model. Jan Apr Jul Oct Jan Apr Jul Oct 106

We re ready to model electric heat Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Let s get our. 107

Setup a view to fetch for electric heat model Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s 108

Implement electric heat model in a view Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s 109

We know each customer s heat type Probability(purchase) = Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Let s continue with our water heater purchase model. 110

We re ready to model purchase behavior Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Let s get our. 111

Setup a view to fetch for purchase behavior model Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s 112

Implement purchase behavior model Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 3 1 2 3 1 2 Views Algorithm s 113

We have our desired result Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 One query to get the list 3 1 2 3 1 2 Views Algorithm s 114

Major plumbing in the old world Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 115

Some considerations on the past vs now Past Now Refresh Score new households Add new source Build new model 116

Refreshing is a breeze Past Now Refresh Major plumbing Single query Score new households Add new source Build new model 117

Easy to calculate insights for new households Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Build new model 118

New? No problem. Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Major plumbing Couple lines of SQL Build new model 119

Re-use previous work for new models Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Major plumbing Couple lines of SQL Build new model Major plumbing Re-use views 120

Hadoop radically reduces plumbing Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Major plumbing Couple lines of SQL Build new model Major plumbing Re-use views 121

Big

Big Quantity

Big Variety + Quantity

It doesn t have to be like this Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 125

You could look for a new job http://careers.stackoverflow.com/jobs?searchterm=+scientist&location= 126

Hadoop Big plumbing

Happy plumbing! Erik Shilts Advanced Analytics erik.shilts@opower.com