Data Science with Hadoop at Opower
|
|
|
- Dale Clark
- 10 years ago
- Views:
Transcription
1 Data Science with Hadoop at Opower Erik Shilts Advanced Analytics
2 What is Opower?
3 A study:
4 $$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off AC & Turn on Fan
5 $$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off AC & Turn on Fan Zero Impact on Consumption
6 $$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off AC & Turn on Fan Zero Impact on Consumption
7 $$$ Environment Citizenship Neighbors Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Zero Impact on Consumption
8 $$$ Environment Citizenship Neighbors Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Turn off AC & Turn on Fan Zero Impact on Consumption 6% Drop in Consumption
9 Opower Details Customer Engagement Platform for Utilities Company ~300 employees Cleantech Company of the Year 2012! 75 utility partners covering > 50M households > 1.5 Terawatt hours saved Our DNA Data analytics Behavioral science
10 What is Opower?
11 What is Opower? One giant big problem
12 Advanced Analytics
13 Advanced Analytics provides consumer insights Our charter is to provide consumers with insights that give context and control over how they use energy. 13
14 We use machine learning and predictive modeling Our charter is to provide consumers with insights that give context and control over how they use energy. Use machine learning, signal processing, and predictive modeling to provide energy usage insights. 14
15 We provide insights into individual energy use Cooling Heating Baseload Jan Apr Jul Oct Jan Apr Jul Oct
16 Data science
17 Data scientists extract meaning Data science is a discipline with the goal of extracting meaning from and creating products. Wikipedia: 17
18 Data scientists are statisticians Data science is a discipline with the goal of extracting meaning from and creating products. In other words, machine learning, statistics, and pretty charts. Wikipedia: 18
19 Data scientists want to extract meaning Data science is a discipline with the goal of extracting meaning from and creating products. In other words, machine learning, statistics, and pretty charts. Wikipedia: 19
20 Data scientists are mungers Data science is a discipline of munging. 20
21 Data scientists prepare Data science is a discipline of munging. Data munging is the process of converting from one form into another for more convenient consumption. Wikipedia: 21
22 Data scientists are plumbers Data science is a discipline of plumbing. Plumbing is difficult. 22
23 It s temporary, I swear! Data science is a discipline of plumbing. Move from here to there. Hack to get the how you want it. 23
24 It works. For now. Data science is a discipline of plumbing. Multiple sources are tricky to handle. Construct a series of tubes. 24
25 Needs user testing Data science is a discipline of plumbing. Sometimes you have to start over when you think you re done. 25
26 Data science is mostly plumbing Data science is a discipline of plumbing. 26
27 It s where we spend all of our time Data science is a discipline of plumbing. We spend 80% of our time on munging and other infrastructure work. 27
28 Fun stuff only 20% of the time Data science is a discipline of plumbing. We spend 80% of our time on munging and other infrastructure work. Sprinkle on some modeling and charts for the other 20%. 28
29 Data science in practice
30 Electric tankless water heater 10% off /h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=
31 Who should get this promotion? /h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=
32 Maximize take-up rate /h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=
33 Minimize marketing cost /h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=
34 Data science in practice Identify likely purchasers
35 Data science in the past How would we have solved this before Hadoop? 35
36 Past is same as the present: construct a model How would we have solved this before Hadoop? Construct a model of likely purchasers. 36
37 Predict purchase behavior with a model Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message We can model purchase behavior at the consumer level. Include predictors that indicate heavy winter electric usage, neighbor influences, and responsiveness to past communications. 37
38 Housing heat type correlates with water heat type Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Does the consumer use electric heat? Households with gas heat are unlikely to purchase an electric water heater. (Natural gas is cheap.) 38
39 Willingness to invest in efficient products Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Has the consumer participated in similar program promotions? Past purchase behavior is a good predictor of future behavior. 39
40 Neighbor effects can be powerful Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Is the product popular about their neighbors? Neighbor effects may influence purchase behavior. 40
41 Responsiveness proxies engagement Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Has the consumer responded to past communications? Past responsiveness indicates high engagement. 41
42 Home Energy Reports influence usage perceptions Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message What type of message has the consumer received on their Home Energy Reports? The relative positioning of past energy usage may influence willingness to invest in future lower usage. 42
43 We have a model. Let s get the. Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message 43
44 Disparate sources Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 44
45 Let s start plumbing Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 45
46 Pipe utility Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 46
47 Pipe customer interaction Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 47
48 Finally, pipe Home Energy Report Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 48
49 Now we re ready to model Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message 49
50 There s a problem Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message We know these predictors 50
51 Heat type is sparse and inaccurate Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message This is harder We know these predictors 51
52 Model electric heat to compensate for bad Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Parcel coverage of heat type is sparse and inaccurate. We need another source for heat type. 52
53 We construct a model to predict heat type Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price We can model the presence of electric heat. Include predictors of weather sensitivity, area prevalence, and local natural gas price. 53
54 Sensitivity of electricity usage to cold weather Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price How sensitive is the consumer s electricity usage to cold weather? High sensitivity to cold weather is our best indicator of electric heat. 54
55 Heat Type Is Related to Geography Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Is electric heat popular in the consumer s area? Heat type tends to have specific geographic distributions. 55
56 Gas Prices May Affect Heat Type Adoption Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price How expensive is the alternative? Natural gas may be hard to get in certain areas. 56
57 We have another model. Let s get the. Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price 57
58 Our plumbing so far Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 58
59 Pipe neighbor heat type Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 59
60 Pipe natural gas prices Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 60
61 Now we re ready to model (x2) Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price 61
62 There s a problem (x2) Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price We know these predictors 62
63 We don t know weather sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price This is harder We know these predictors 63
64 Luckily, we know how to do this Cooling Heating Baseload Jan Apr Jul Oct Jan Apr Jul Oct
65 We have a disaggregation algorithm. Let s get the. Cooling Heating Baseload Jan Apr Jul Oct Jan Apr Jul Oct
66 Disaggregate heating and cooling Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Correlate electricity usage with weather. Let s grab the. Jan Apr Jul Oct Jan Apr Jul Oct 66
67 Our plumbing so far (x2) Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 67
68 Pipe electricity usage Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 68
69 Pipe thermostat Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 69
70 Pipe weather Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 70
71 Starting to feel like Inception 71
72 Now we re ready to model (finally) Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Jan Apr Jul Oct Jan Apr Jul Oct Construct disaggregation algorithms. Calculate sensitivity for all households. 72
73 Disaggregate and store results Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 73
74 We know each customer s heating sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Let s continue with our electric heat model. Jan Apr Jul Oct Jan Apr Jul Oct 74
75 We have the to finish our heat type model Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Construct electric heat model. Impute heat type for all households. 75
76 Impute heat type and store results Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 76
77 We know each customer s heat type Probability(purchase) = Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Let s continue with our water heater purchase model. 77
78 We now have the to finish our purchase model Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Construct purchase behavior model. Calculate likelihood to purchase for all households. 78
79 Calculate likelihood to purchase and store results Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 79
80 We have our desired result Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 80
81 Data science is plumbing Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 81
82 New request: Who would buy an efficient pool pump for 10% off? 82
83 I remember what the last model took Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 83
84 and I start searching the want-ads 84
85 But it gets better Now we have Hadoop! 85
86 Past is same as the present: construct a model How would we have solved this with Hadoop? Construct a model of likely purchasers. 86
87 Hadoop has a key advantage How would we have solved this with Hadoop? Construct a model of likely purchasers. Integrated warehousing and crunching 87
88 Data and analytical capabilities in a single place Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Algorithm s 88
89 Hadoop solves plumbing problem Utility usage Thermostat Weather Customer interaction history Additional streams All the Hadoop Cluster Algorithm s 89
90 Fully integrated crunching Utility usage Thermostat Weather Customer interaction history Additional streams All the Hadoop Cluster Analytical capabilities Algorithm s 90
91 Our model is the same. Let s start building it. Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message 91
92 Still need weather sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Calculating sensitivity is much easier with Hadoop. Let s get the. Jan Apr Jul Oct Jan Apr Jul Oct 92
93 Fetch your with Hive views Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s 93
94 Views provide fresh on demand Hive is a SQL-like interface to Hadoop. Hive views are saved queries that you treat like a table. Build views on top of views to setup complex analyses. Querying a view takes longer to execute, but it ensures fresh. 94
95 View syntax is plain SQL CREATE VIEW analytics.disaggregation_inputs_view AS SELECT w.temperature, r.usage_value FROM analytics.weather w JOIN analytics.reads r on w.zip_code = r.zip_code ; 95
96 Views are on demand Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s More without the storage 96
97 Views point at without storing it Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s More without the storage 97
98 Views on top of views for complex analyses Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s More without the storage 98
99 Setup a view to get disaggregation Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s 99
100 We have our disaggregation Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = We need to calculate the model and store the results. Hadoop is built to do both. Jan Apr Jul Oct Jan Apr Jul Oct 100
101 Setup a view to run disaggregation algorithms Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s 101
102 Hadoop streaming + Views = Power Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Use Hadoop 4 streaming within the view Views Algorithm s 102
103 Hadoop streaming can calculate anything Stream through any script. Pipe any through standard input and send any to standard output. Integrate with any language: R, Python, Ruby, Bash, Java, etc. SELECT TRANSFORM command in Hive is an easy way to use Hadoop streaming. 103
104 Hadoop streaming is easy to implement in Hive CREATE VIEW analytics.disaggregation_outputs_view AS SELECT Executable reads from stdin and writes to stdout TRANSFORM ( diw.temperature, diw.usage_value ) USING weather_disaggregation.r FROM analytics.disaggregation_inputs_view diw ; 104
105 Simple SQL syntax to produce any result Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 Algorithm in any language Views Algorithm s 105
106 We know each customer s heating sensitivity Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity = Let s continue with our electric heat model. Jan Apr Jul Oct Jan Apr Jul Oct 106
107 We re ready to model electric heat Probability(purchase) = β 1 Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Let s get our. 107
108 Setup a view to fetch for electric heat model Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s 108
109 Implement electric heat model in a view Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s 109
110 We know each customer s heat type Probability(purchase) = Pr(Electric Heat) = δ 1 Weather Sensitivity + δ 2 Neighbors Heat + δ 3 Natural Gas Price Let s continue with our water heater purchase model. 110
111 We re ready to model purchase behavior Probability(purchase) = β 1 Electric Heat + β 2 Similar Purchases + β 3 Neighbors Purchased + β 4 Response Rate + β 5 Type Of Message Let s get our. 111
112 Setup a view to fetch for purchase behavior model Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s 112
113 Implement purchase behavior model Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster Views Algorithm s 113
114 We have our desired result Utility usage Thermostat Weather Customer interaction history Additional streams Hadoop Cluster 4 4 One query to get the list Views Algorithm s 114
115 Major plumbing in the old world Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 115
116 Some considerations on the past vs now Past Now Refresh Score new households Add new source Build new model 116
117 Refreshing is a breeze Past Now Refresh Major plumbing Single query Score new households Add new source Build new model 117
118 Easy to calculate insights for new households Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Build new model 118
119 New? No problem. Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Major plumbing Couple lines of SQL Build new model 119
120 Re-use previous work for new models Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Major plumbing Couple lines of SQL Build new model Major plumbing Re-use views 120
121 Hadoop radically reduces plumbing Past Now Refresh Major plumbing Single query Score new households Major plumbing Single query Add new source Major plumbing Couple lines of SQL Build new model Major plumbing Re-use views 121
122 Big
123 Big Quantity
124 Big Variety + Quantity
125 It doesn t have to be like this Analytics server Utility usage Thermostat Weather Customer interaction history Additional streams 125
126 You could look for a new job 126
127 Hadoop Big plumbing
128 Happy plumbing! Erik Shilts Advanced Analytics
Turning off lights, lowering the COMMUNITY JAVA IN ACTION
Opower s Rick McPhee (right) meets with Thermostat team members Tom Darci (bottom left), Thirisangu Thiyagarajanodipsa, and Mari Miyachi. COMMUNITY Big Data and Java POWER TO THE PEOPLE Opower s Java and
Sentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10
FINDINGS 1 INDEX 1 2 3 4 Introduction Page 3 Methodology Page 4 Findings Page 5 Conclusion Page 10 INTRODUCTION Our 2016 Data Scientist report is a follow up to last year s effort. Our aim was to survey
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
MICRO HYDRO FOR THE FARM AND HOME
MICRO HYDRO FOR THE FARM AND HOME How much can I expect to save? This depends entirely on the available flow, available head (fall) and the duration that the flow is available. Some farms struggle to maintain
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到
Building a data analytics platform with Hadoop, Python and R
Building a data analytics platform with Hadoop, Python and R Agenda Me Sanoma Past Present Future 3 18 November 2013 /me @skieft Software architect for Sanoma Managing the data and search team Focus on
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at [email protected].
What s Cooking in KNIME
What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Big Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure June 12, 2013 2 Agenda Let s talk about Data Infrastructure, how we did it, what we learned and how we ve failed Some Context
Microsoft Research Windows Azure for Research Training
Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
HDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Using the AVR microcontroller based web server
1 of 7 http://tuxgraphics.org/electronics Using the AVR microcontroller based web server Abstract: There are two related articles which describe how to build the AVR web server discussed here: 1. 2. An
Big Data Scenario mit Power BI vs. SAP HANA Gerhard Brückl
Big Data Scenario mit Power BI vs. SAP HANA Gerhard Brückl About me Gerhard Brückl Working with Microsoft BI since 2006 Started working with SAP HANA in 2013 focused on Analytics and Reporting Blog: email:
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Microsoft Research Microsoft Azure for Research Training
Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
Hybrid (Dual Fuel) - Gas Heat and Air Source Heat Pump
s.doty 10-2014 White Paper #30 Hybrid (Dual Fuel) - Gas Heat and Air Source Heat Pump System Switches Fuels Automatically for Economy. What is an Air-Source Heat Pump? A heat pump is a modified air conditioning
Building Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
Shark Installation Guide Week 3 Report. Ankush Arora
Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................
Creating Connection with Hive
Creating Connection with Hive Intellicus Enterprise Reporting and BI Platform Intellicus Technologies [email protected] www.intellicus.com Creating Connection with Hive Copyright 2010 Intellicus Technologies
Spatio-Temporal Networks:
Spatio-Temporal Networks: Analyzing Change Across Time and Place WHITE PAPER By: Jeremy Peters, Principal Consultant, Digital Commerce Professional Services, Pitney Bowes ABSTRACT ORGANIZATIONS ARE GENERATING
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
10605 BigML Assignment 4(a): Naive Bayes using Hadoop Streaming
10605 BigML Assignment 4(a): Naive Bayes using Hadoop Streaming Due: Friday, Feb. 21, 2014 23:59 EST via Autolab Late submission with 50% credit: Sunday, Feb. 23, 2014 23:59 EST via Autolab Policy on Collaboration
An interdisciplinary model for analytics education
An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Water Heating and Heat Recovery 1
PID 113 (488) Water Heating and Heat Recovery 1 Florida Power Corporation 2 ENERGY-EFFICIENT WATER HEATING: IT S THE EASIEST WAY TO SAVE ENERGY DOLLARS Anywhere from 10% to 25% of the money you spend on
Attached is a link to student educational curriculum on plug load that provides additional information and an example of how to estimate plug load:
Energy Conservation Tip #1 Always turn off all lights when you leave a room or when they are not needed. It is estimated that 26% of a building s electricity is used for lighting. For D-11 this equals
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Assignment 2: More MapReduce with Hadoop
Assignment 2: More MapReduce with Hadoop Jean-Pierre Lozi February 5, 2015 Provided files following URL: An archive that contains all files you will need for this assignment can be found at the http://sfu.ca/~jlozi/cmpt732/assignment2.tar.gz
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006
Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements
Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
Questionnaire about the skills necessary for people. working with Big Data in the Statistical Organisations
Questionnaire about the skills necessary for people working with Big Data in the Statistical Organisations Preliminary results of the survey (19.08 2014) More detailed analysis will be prepared by October
CDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.
Making big data simple with Databricks
Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics
The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that
Car Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
How To Learn To Use Big Data
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!
Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Moderator: David L. Snell, ASA, MAAA Presenters: Brian D. Holland, FSA, MAAA
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
Customer Case Study. Sharethrough
Customer Case Study Customer Case Study Benefits Faster prototyping of new applications Easier debugging of complex pipelines Improved overall engineering team productivity Summary offers a robust advertising
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Is a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
Modernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
Heat Trace Fundamentals. Monte Vander Velde, P.E. President, Interstates Instrumentation
Heat Trace Fundamentals Monte Vander Velde, P.E. President, Interstates Instrumentation Contents Introduction... 3 Purpose of Heat Trace... 3 Types of Heat Trace... 4 Steam Trace vs. Electric Trace...
Learn how to store and analyze Big Data Learn about the cloud and its services for Big Data
CS-495/595 Big Data: Exam #1 Spring 2015 Wed. 4:20PM - 7:00PM Constant Hall 1043 Instructor: Dr. Cartledge http://www.cs.odu.edu/ ccartled/teaching Big data is quadrupling every year!! Everyone is creating
Putting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights
DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other
HEAT PUMP FREQUENTLY ASKED QUESTIONS HEAT PUMP OUTDOOR UNIT ICED-UP DURING COLD WEATHER:
HEAT PUMP FREQUENTLY ASKED QUESTIONS HEAT PUMP OUTDOOR UNIT ICED-UP DURING COLD WEATHER: It is normal for a heat pump to have a build up of white frost on the outside coil during cold damp weather. The
Massive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
Comfort you can count on.
Comfort you can count on. Your home s indoor temperature should feel just right in any season. We ll make sure it does. Installation Repairs Maintenance w ww.ssihvac.c om (703) 968-0 6 0 6 Proudly Serving
COURSE SYLLABUS COURSE TITLE:
1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the
Online Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011
Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with
Big Data and Data Science. The globally recognised training program
Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
Heating and cooling. Insulation. Quick facts on insulation:
Heating and cooling Heating and cooling your home can be one of the biggest users of energy. Efficency in this area can bring great savings. Heating and cooling depend on many contributing factors. The
Energy'Saving,'Thermal'Comfort'and'Solar'Power'Information'Sheet'
Energy'Saving,'Thermal'Comfort'and'Solar'Power'Information'Sheet' We ve prepared this information sheet to help you to minimise energy consumption and energy costs while maximising thermal comfort at home.
Creating the Ideal Environment for Accelerated Drying Times on Construction and Restoration Projects
4.02.08 updated Creating the Ideal Environment for Accelerated Drying Times on Construction and Restoration Projects Moisture Removal Systems www.groundheaters.com www.wackerneuson.com Controlling moisture
Creating a universe on Hive with Hortonworks HDP 2.0
Creating a universe on Hive with Hortonworks HDP 2.0 Learn how to create an SAP BusinessObjects Universe on top of Apache Hive 2 using the Hortonworks HDP 2.0 distribution Author(s): Company: Ajay Singh
SEERS International. Hybrid Hot Water Systems. Innovation Technology
SEERS International Hybrid Hot Water Systems Innovation Technology EFFECTS OF GLOBAL WARMING Invented & Manufactured By : Seers International Patent Number: PI-20082195 5 th Generation Hybrid Hot Water
Beginner s Guide to. BigDataAnalytics
Beginner s Guide to BigDataAnalytics Introduction Big Data, What do these two words really mean? Yes everyone is talking about it but frankly, not many really understand what the hype is all about. This
Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
Processing millions of logs with Logstash
and integrating with Elasticsearch, Hadoop and Cassandra November 21, 2014 About me My name is Valentin Fischer-Mitoiu and I work for the University of Vienna. More specificaly in a group called Domainis
Introduction To Hive
Introduction To Hive How to use Hive in Amazon EC2 CS 341: Project in Mining Massive Data Sets Hyung Jin(Evion) Kim Stanford University References: Cloudera Tutorials, CS345a session slides, Hadoop - The
BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview
BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop
Machine Data Analytics with Sumo Logic
Machine Data Analytics with Sumo Logic A Sumo Logic White Paper Introduction Today, organizations generate more data in ten minutes than they did during the entire year in 2003. This exponential growth
Accessing Information from your Electronic Health Records applications: Tools and Techniques
Accessing Information from your Electronic Health Records applications: Tools and Techniques Presented by: Stephanie Heckman, Sr Consultant, PCCS, Inc [email protected] 866-274-3477 Who is PCCS, Inc Healthcare
http://glennengstrand.info/analytics/fp
Functional Programming and Big Data by Glenn Engstrand (September 2014) http://glennengstrand.info/analytics/fp What is Functional Programming? It is a style of programming that emphasizes immutable state,
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
Cloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
