How old are you?
FROM THE BIG BANG TO THE NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA Patrick Deglon Director of Engineering, Analytics Area Tech Lead pdeglon@motorola.com linkd.in/pdeglon
from the Big Bang FROM THE BIG BANG TO ECOMMERCE, 4
15 billions years 5 billions years 1 billion years 300,000 years 2 min 0.0000000001 sec 10-34 sec = 0.0 001 sec (34 zeros) 10-43 sec = 0.0 001 sec (43 zeros) Image: CERN
During 1996-2002, worked at CERN Mont Blanc (the European Laboratory for Particle Physics) for my MS and PhD at the University of Geneva Geneva Switzerland Image: CERN 17 miles underground tunnel for the LEP & LHC accelerator Source: CERN 6
Image: CERN 7 Source: CERN
PAW Physics Analysis Workstation Source: Wikipedia Tape robot Source: CERN Data collection & analysis was done in Fortran. Advance analysis/statistics was done through PAW. [1996-2002] 8
Example of a particle collision 9
Solving the puzzle which particles go together? 1. AB + CD? 2. AC + BD? 3. AD + BC? A B? D C 10
Solution: Big Data infrastructure enables large scale computational such as combine all possibilities (cross-product) Schematic View CERN Example (discovery of a new particle bb) Signal (particle resonance) Statistical Noise Source: http://www.atlas.ch/news/2011/atlas-discovers-its-first-new-particle.html 11
Size of the electron? R < 5.1 x 10-19 m *** *** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc. 3332 12
Extra dimension? M S > 1.1 TeV *** graviton extra dimension e + e + e - e - our universe in 4 dimensions *** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc. 3332 13
to the New Economy FROM THE BIG BANG TO ECOMMERCE, 14
to the New Economy Imagine a world... FROM THE BIG BANG TO ECOMMERCE, 15
where information is ubiquitous (anytime & anywhere)
to the New Economy where buildings can recognize your presence FROM THE BIG BANG TO ECOMMERCE, 17
where even to the New Economy streetlights are connected to Internet FROM THE BIG BANG TO ECOMMERCE, 18
to the New Economy Welcome to a connected world FROM THE BIG BANG TO ECOMMERCE, 19
Example #1 KPI reporting & Impact Measurement So, how is the business doing? 20
Key Performance Indicators Simplified Business Flow Motorola Factory # Shipments Distribution Channels # Sales First Usage # Activations 21
Data Flow Motorola Factory # Shipments Distribution Channels # Sales First Usage # Activations... Motorola Cloud Google BigQuery Insights 22
Google Spreadsheet as a Reporting Engine BigQuery HTML body in sheet Spreadsheet App Script Charts Scheduler Mail Distribution Group 23
Google Spreadsheet as a Reporting Engine 24
Analytics Portal Google Drive Android App Google App Engine (Web Server) Data Source: Big Query Data Source: Google Analytics Web portal Data Source: Spreadsheet & CSV Users Access Control: Google Users + Drive Sharing Report Meta Data: Google Drive (Text file with JSON) Report Meta Data: Datastore (Report copy & usage tracking) iframe Source: Tableau Server iframe Source: Google Documents 25
Answer to the Ultimate Question of Life, The Universe, and Everything 42 so what? 26
Measuring impact of initiatives Number of purchases 450 400 350 300 250 200 150 100 50 A/B test illustrative example (Simulation) Initiative launched control group test group 0 Aug 1st Sep 1st Oct 1st Impact of the initiative Number of listings 35,000 30,000 25,000 20,000 15,000 10,000 5,000 Pre/Post analysis illustrative example (Simulation) B A pre Initiative launched post C D Impact of the initiative 0 Aug 1st Sep 1st Oct 1st 2012 2011 Randomized Test/Control group methodology is a golden standard in research Used to measure the impact of an initiative in a full market or a market segment 27
Campaign Measurement Campaigns KPI Campaign Id Campaign Name Time range Set of Countries Set of Products X Date Country Product KPI[] Trend Campaign Id Date Total of KPI[] Time Series Analysis Summary Campaign Id Campaign Name Impact Measurement[] Statistical Error[] 28
Campaign Measurement Campaign Window Don t include weekday cycle in your volatility measurement 29
Campaign Measurement Drive Insights Define Campaign Measure Impacts Run Campaign 30
Exception Reports (Illustrative Examples) Data Issue Business Issue Number of Active Users using their camera in US Number of System Restarts # Active Users # System Restarts time (day) Root Causes Root Cause Some files don t get loaded properly in BigQuery, creating gaps in user count. The instrumentation changed on the device Customer behavior A buggy Android app doesn t handle the timezone change properly, crashing the devices. 31
markets Approach 1. Define a multi-dimensional cubes with real data. For example: Product, Market, # Users taking a picture 3. Clean the data (remove seasonality, weekday cycle and any other know perturbation) 5. Measure variance versus prediction for each cell (e.g. market/product/metric) and trigger an exception if outside band 2. Each cell becomes then a time series 4. Fit trend and establish volatility band (2 std deviations) 6. Collect all exceptions into a matrix and apply fuzzy logic* to propose potential root causes BR products * Note: (Bayesian likelihood with knowledge base) 32
Example #2 Internet Marketing How much sales did my campaign generated? 33
Case study: Online Search Paid Search Natural/Organic Search (free) 34
Customer behaviors and Internet Marketing Investment Which customer purchases are influenced by Marketing? Behavioral purchases Uncorrelated to Marketing X days 2 purchases missing X days $ $ $ $ $ $ $ $ $ $ $ Jan 1st Influence purchase Correlated to Marketing Feb 1st click Y days 1 purchase is uncorrelated click Mar 1st Y days all purchases are incremental 35
Remember this physics problem? 1. AB + CD? 2. AC + BD? 3. AD + BC? A B? D C 36
Solution: Big Data infrastructure enables large scale computational such as combine all possibilities (cross-product) Schematic View CERN Example (discovery of a new particle bb) Signal (particle resonance) Statistical Noise Combine correlated events and uncorrelated events produce a system with a statistical noise (which is simple enough to extract) and the researched signal Source: http://www.atlas.ch/news/2011/atlas-discovers-its-first-new-particle.html 37
Latency time for each pair click - purchase Negative Latency Purchase before Click (no causality) Behavior only Positive Latency Purchase after Click (potential causality) Behavior & Internet Marketing impact Level of behavioral purchases Number of events (pairs clickpurchase) Marketing incrementality (correlated purchases) Level of behavioral purchases -14-12 -10-8 -6-4 -2 User clicks on an ad-banner at time=0 0 2 4 6 8 10 12 14 Latency (days) User makes a purchase X days later 38
So what? Method 1 Sales ROI Channel A 8% +20% Channel B 5% -10% Channel C 1% +10% Reduce spend on channel B Invest in channel A When prioritizing, ignore channel C <> Method 2 Sales ROI Channel A 7% -20% Channel B 6% +30% Channel C 12% +60% Reduce spend on channel A Invest heavily on channel C Marketing counts actually for 25% of the site 39
Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 40
Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 41
Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 42
Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 43
So, what s next? Marketing 101 Cost Purchase L L? C Direct Return Incr Return No Purchase Rule #1: Never, ever, spend money unless you really-really have to C? D D Don t Do Marketing Do Marketing 44
So, what s next? Output Cost Return (Revenues) DReturn = DInvestment i.e. marginal ROI = 0 Rule #2: If you have to spend, you spend to the point of marginal return=0 Max Profit Max Sales No Profit Total ROI = 0 Investment (costs) Profit 45
Marginal Return Chart ROI In depth Analysis require to validate high ROI Area/initiatives/segment with negative profitability Cost reduction opportunity! Spend Bucket 0 (most profitable) Spend Bucket i Point of marginal return = 0 (maximum profit) Current Spend Level Spend Bucket N (least profitable) Cumulative Cost 46
Example #3 The cost of Big Data What is my share of the pile? 47
Google Cloud Platform Cost ~ 0 > 0 How to determine who is costing how much? 48
How to track Big Query usage? Google does not provide a data feed on its customer s usage of BigQuery. However three API can help us: bigquery. projects.list bigquery. jobs.list bigquery. jobs.get List all (visible) projects List all the Jobs in a specified project. Note: use projection = full to get email of user Retrieve the specified job by ID. The queries are parsed to extract underlying tables used, and the data is stored in the App Engine datastore as well as in Big Query through the streaming API (every 15 minutes). 49
Beyond Queries, we also scan Tables bigquery. projects.list bigquery. datasets.list datastore queries information List projects visible List datasets within a project bigquery. tables.list List tables within a dataset bigquery. tables.get Get details about a table 50
Enables Enlightenment Questions for an Analyst When was this table last refreshed? How often is it refreshed? How was it created? Underlying data sources/tables? Who created this table? Who knows how to use this table? Where can I find this great query I ran? Who knows how to use this tag/metric? How much bandwidth am I using? How much space are my tables using? How much does my usage cost? Rick Hotten 51
How much bandwidth am I using in BigQuery? 52
Big Query Pricing Storage Cost $0.02 per GB per month $6.83 per TB per day Query Cost On-demand $5 per TB Reserved capacity $20,000 per month for 5 GB/s unit, i.e. $1.58 per TB* * Note: for continuous usage of the 5 GB/s bandwidth 53
How much does my usage of BigQuery cost? Assuming that the Motorola bandwidth is elastic, i.e. we always pay for the optimal number of units (5 Gb/s), we can use $1.58 per TB as a proxy 54
Weekly Email to largest BQ users 55
Example #4 Human Resources It s time for your annual review 56
Annual Review Feedback What is the optimal method to determine the list of key work partner to request feedback from? Which would balance objectivity and relevancy? 57
Scrapping Gmail and Google Calendar gmail.users.messages.list List User Email (by page of 100) gmail.users.messages.get Get 1 email details datastore Scoring calendar.events.list List events & meta-data (by page of 100) 1 pts = 30 min meeting = 10 emails Weight is divided by number of participants 58
Example 59
Wrapping Up CERN vs New Economy CERN Write kilometers long Fortran code Analysis can run for many hours before a batch robot error Study billions of collision data Great depth of data structure & complexity Know your local expert for question but try to find the solution by yourself much quicker Remove bad runs (unclean data batch) Transform a complex system into insights Communicate findings to conferences Strong competitive landscape (4 distinct experiments competing to the first to publish, or publish better results) New Economy Write miles long SQL code Queries can run for many hours before a spool space error Study billions of customer data Great depth of data structure & complexity Know your local expert for question but try to find the solution by yourself much quicker Remove wackos (non material transactions) Transform a complex system into insights Communicate recommendation to business review Strong competitive landscape 60
Appendix 61
to the New Economy About Us Motorola exists to invent, build and deliver the best mobile devices on the planet, improving the lives of millions of people. FROM THE BIG BANG TO ECOMMERCE, 62
Motorola: 80+ YEARS OF INNOVATION to the New Economy 1928 1936 1943 1947 1955 1963 1969 Motorola introduced Police Cruiser Radio Receiver Galvin Manufacturing Corp World s first portable FM two-way radio World s first high-power transistor in commercial production World s first truly rectangular color TV tube First words from moon relayed via Motorola radio 1973 Demonstrated prototype of the DynaTAC portable cellular system 1983 1990 1991 1996 World s first commercial handheld cellular phone DynaTAC 8000X weighed 28-ounces (794 grams) World s first HDTV technical standard World s first GSM cellular system World s first dual-mode cellular phone The 3.1 ounce (88 grams) StarTac wearable cellular phone is the world s smallest and lightest 1999 2000 World's first handset, iden i1000plus, to combine a digital phone, two-way radio, Internet microbrowser, e-mail, fax and twoway messaging World s first general packet radio service (GPRS) wireless phone for always on Internet access 2002 2004 2006 2009 2012 2013 2014 World s first wireless cable modem gateway introduced Iconic RAZR V3 wireless phone introduced MING smart phone recognizes 10,000+ handwritten characters from Chinese alphabet Motorola DROID #1 on Time s Top Ten of 2009 Launch MotoX, Moto G Fast upgrades Moto E Moto 360 FROM THE BIG BANG TO ECOMMERCE, 63
to the New Economy Mobile was a revolution, but Mobile is an outdated concept. Clouds (Internet, Connected World, World s Information, ) will be available everywhere: phones, watches, glasses, cars, appliance, microchip implant,... FROM THE BIG BANG TO ECOMMERCE, 64
Motorola Cloud Customers to Ecosystem the New Economy Motorola Cloud Product Marketing Web Engineering Finance Consumers: Phones, Wearables & Companion Products Moto Maker Sales Business Operation Customer Support Internal Business Teams Partners & Carriers FROM THE BIG BANG TO ECOMMERCE, 65
Motorola Cloud Applications to & Services the New Economy On-Device Applications & Services Web Applications & Software as a Service Platform as a Service Infrastructure as a Service FROM THE BIG BANG TO ECOMMERCE, 66