# FROM THE BIG BANG TO THE NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA

1

2 How old are you?

3 FROM THE BIG BANG TO THE NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA Patrick Deglon Director of Engineering, Analytics Area Tech Lead linkd.in/pdeglon

4 from the Big Bang FROM THE BIG BANG TO ECOMMERCE, 4

5 15 billions years 5 billions years 1 billion years 300,000 years 2 min sec sec = sec (34 zeros) sec = sec (43 zeros) Image: CERN

6 During , worked at CERN Mont Blanc (the European Laboratory for Particle Physics) for my MS and PhD at the University of Geneva Geneva Switzerland Image: CERN 17 miles underground tunnel for the LEP & LHC accelerator Source: CERN 6

7 Image: CERN 7 Source: CERN

8 PAW Physics Analysis Workstation Source: Wikipedia Tape robot Source: CERN Data collection & analysis was done in Fortran. Advance analysis/statistics was done through PAW. [ ] 8

9 Example of a particle collision 9

10 Solving the puzzle which particles go together? 1. AB + CD? 2. AC + BD? 3. AD + BC? A B? D C 10

11 Solution: Big Data infrastructure enables large scale computational such as combine all possibilities (cross-product) Schematic View CERN Example (discovery of a new particle bb) Signal (particle resonance) Statistical Noise Source: 11

12 Size of the electron? R < 5.1 x m *** *** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc

13 Extra dimension? M S > 1.1 TeV *** graviton extra dimension e + e + e - e - our universe in 4 dimensions *** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc

14 to the New Economy FROM THE BIG BANG TO ECOMMERCE, 14

15 to the New Economy Imagine a world... FROM THE BIG BANG TO ECOMMERCE, 15

16 where information is ubiquitous (anytime & anywhere)

17 to the New Economy where buildings can recognize your presence FROM THE BIG BANG TO ECOMMERCE, 17

18 where even to the New Economy streetlights are connected to Internet FROM THE BIG BANG TO ECOMMERCE, 18

19 to the New Economy Welcome to a connected world FROM THE BIG BANG TO ECOMMERCE, 19

20 Example #1 KPI reporting & Impact Measurement So, how is the business doing? 20

21 Key Performance Indicators Simplified Business Flow Motorola Factory # Shipments Distribution Channels # Sales First Usage # Activations 21

22 Data Flow Motorola Factory # Shipments Distribution Channels # Sales First Usage # Activations... Motorola Cloud Google BigQuery Insights 22

23 Google Spreadsheet as a Reporting Engine BigQuery HTML body in sheet Spreadsheet App Script Charts Scheduler Mail Distribution Group 23

25 Analytics Portal Google Drive Android App Google App Engine (Web Server) Data Source: Big Query Data Source: Google Analytics Web portal Data Source: Spreadsheet & CSV Users Access Control: Google Users + Drive Sharing Report Meta Data: Google Drive (Text file with JSON) Report Meta Data: Datastore (Report copy & usage tracking) iframe Source: Tableau Server iframe Source: Google Documents 25

26 Answer to the Ultimate Question of Life, The Universe, and Everything 42 so what? 26

27 Measuring impact of initiatives Number of purchases A/B test illustrative example (Simulation) Initiative launched control group test group 0 Aug 1st Sep 1st Oct 1st Impact of the initiative Number of listings 35,000 30,000 25,000 20,000 15,000 10,000 5,000 Pre/Post analysis illustrative example (Simulation) B A pre Initiative launched post C D Impact of the initiative 0 Aug 1st Sep 1st Oct 1st Randomized Test/Control group methodology is a golden standard in research Used to measure the impact of an initiative in a full market or a market segment 27

28 Campaign Measurement Campaigns KPI Campaign Id Campaign Name Time range Set of Countries Set of Products X Date Country Product KPI[] Trend Campaign Id Date Total of KPI[] Time Series Analysis Summary Campaign Id Campaign Name Impact Measurement[] Statistical Error[] 28

29 Campaign Measurement Campaign Window Don t include weekday cycle in your volatility measurement 29

30 Campaign Measurement Drive Insights Define Campaign Measure Impacts Run Campaign 30

31 Exception Reports (Illustrative Examples) Data Issue Business Issue Number of Active Users using their camera in US Number of System Restarts # Active Users # System Restarts time (day) Root Causes Root Cause Some files don t get loaded properly in BigQuery, creating gaps in user count. The instrumentation changed on the device Customer behavior A buggy Android app doesn t handle the timezone change properly, crashing the devices. 31

32 markets Approach 1. Define a multi-dimensional cubes with real data. For example: Product, Market, # Users taking a picture 3. Clean the data (remove seasonality, weekday cycle and any other know perturbation) 5. Measure variance versus prediction for each cell (e.g. market/product/metric) and trigger an exception if outside band 2. Each cell becomes then a time series 4. Fit trend and establish volatility band (2 std deviations) 6. Collect all exceptions into a matrix and apply fuzzy logic* to propose potential root causes BR products * Note: (Bayesian likelihood with knowledge base) 32

33 Example #2 Internet Marketing How much sales did my campaign generated? 33

34 Case study: Online Search Paid Search Natural/Organic Search (free) 34

35 Customer behaviors and Internet Marketing Investment Which customer purchases are influenced by Marketing? Behavioral purchases Uncorrelated to Marketing X days 2 purchases missing X days \$ \$ \$ \$ \$ \$ \$ \$ \$ \$ \$ Jan 1st Influence purchase Correlated to Marketing Feb 1st click Y days 1 purchase is uncorrelated click Mar 1st Y days all purchases are incremental 35

36 Remember this physics problem? 1. AB + CD? 2. AC + BD? 3. AD + BC? A B? D C 36

37 Solution: Big Data infrastructure enables large scale computational such as combine all possibilities (cross-product) Schematic View CERN Example (discovery of a new particle bb) Signal (particle resonance) Statistical Noise Combine correlated events and uncorrelated events produce a system with a statistical noise (which is simple enough to extract) and the researched signal Source: 37

38 Latency time for each pair click - purchase Negative Latency Purchase before Click (no causality) Behavior only Positive Latency Purchase after Click (potential causality) Behavior & Internet Marketing impact Level of behavioral purchases Number of events (pairs clickpurchase) Marketing incrementality (correlated purchases) Level of behavioral purchases User clicks on an ad-banner at time= Latency (days) User makes a purchase X days later 38

39 So what? Method 1 Sales ROI Channel A 8% +20% Channel B 5% -10% Channel C 1% +10% Reduce spend on channel B Invest in channel A When prioritizing, ignore channel C <> Method 2 Sales ROI Channel A 7% -20% Channel B 6% +30% Channel C 12% +60% Reduce spend on channel A Invest heavily on channel C Marketing counts actually for 25% of the site 39

40 Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 40

41 Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 41

42 Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 42

43 Case study: Online Search Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 43

44 So, what s next? Marketing 101 Cost Purchase L L? C Direct Return Incr Return No Purchase Rule #1: Never, ever, spend money unless you really-really have to C? D D Don t Do Marketing Do Marketing 44

45 So, what s next? Output Cost Return (Revenues) DReturn = DInvestment i.e. marginal ROI = 0 Rule #2: If you have to spend, you spend to the point of marginal return=0 Max Profit Max Sales No Profit Total ROI = 0 Investment (costs) Profit 45

46 Marginal Return Chart ROI In depth Analysis require to validate high ROI Area/initiatives/segment with negative profitability Cost reduction opportunity! Spend Bucket 0 (most profitable) Spend Bucket i Point of marginal return = 0 (maximum profit) Current Spend Level Spend Bucket N (least profitable) Cumulative Cost 46

47 Example #3 The cost of Big Data What is my share of the pile? 47

48 Google Cloud Platform Cost ~ 0 > 0 How to determine who is costing how much? 48

49 How to track Big Query usage? Google does not provide a data feed on its customer s usage of BigQuery. However three API can help us: bigquery. projects.list bigquery. jobs.list bigquery. jobs.get List all (visible) projects List all the Jobs in a specified project. Note: use projection = full to get of user Retrieve the specified job by ID. The queries are parsed to extract underlying tables used, and the data is stored in the App Engine datastore as well as in Big Query through the streaming API (every 15 minutes). 49

50 Beyond Queries, we also scan Tables bigquery. projects.list bigquery. datasets.list datastore queries information List projects visible List datasets within a project bigquery. tables.list List tables within a dataset bigquery. tables.get Get details about a table 50

51 Enables Enlightenment Questions for an Analyst When was this table last refreshed? How often is it refreshed? How was it created? Underlying data sources/tables? Who created this table? Who knows how to use this table? Where can I find this great query I ran? Who knows how to use this tag/metric? How much bandwidth am I using? How much space are my tables using? How much does my usage cost? Rick Hotten 51

52 How much bandwidth am I using in BigQuery? 52

53 Big Query Pricing Storage Cost \$0.02 per GB per month \$6.83 per TB per day Query Cost On-demand \$5 per TB Reserved capacity \$20,000 per month for 5 GB/s unit, i.e. \$1.58 per TB* * Note: for continuous usage of the 5 GB/s bandwidth 53

54 How much does my usage of BigQuery cost? Assuming that the Motorola bandwidth is elastic, i.e. we always pay for the optimal number of units (5 Gb/s), we can use \$1.58 per TB as a proxy 54

55 Weekly to largest BQ users 55

56 Example #4 Human Resources It s time for your annual review 56

57 Annual Review Feedback What is the optimal method to determine the list of key work partner to request feedback from? Which would balance objectivity and relevancy? 57

58 Scrapping Gmail and Google Calendar gmail.users.messages.list List User (by page of 100) gmail.users.messages.get Get 1 details datastore Scoring calendar.events.list List events & meta-data (by page of 100) 1 pts = 30 min meeting = 10 s Weight is divided by number of participants 58

59 Example 59

60 Wrapping Up CERN vs New Economy CERN Write kilometers long Fortran code Analysis can run for many hours before a batch robot error Study billions of collision data Great depth of data structure & complexity Know your local expert for question but try to find the solution by yourself much quicker Remove bad runs (unclean data batch) Transform a complex system into insights Communicate findings to conferences Strong competitive landscape (4 distinct experiments competing to the first to publish, or publish better results) New Economy Write miles long SQL code Queries can run for many hours before a spool space error Study billions of customer data Great depth of data structure & complexity Know your local expert for question but try to find the solution by yourself much quicker Remove wackos (non material transactions) Transform a complex system into insights Communicate recommendation to business review Strong competitive landscape 60

61 Appendix 61

62 to the New Economy About Us Motorola exists to invent, build and deliver the best mobile devices on the planet, improving the lives of millions of people. FROM THE BIG BANG TO ECOMMERCE, 62

63 Motorola: 80+ YEARS OF INNOVATION to the New Economy Motorola introduced Police Cruiser Radio Receiver Galvin Manufacturing Corp World s first portable FM two-way radio World s first high-power transistor in commercial production World s first truly rectangular color TV tube First words from moon relayed via Motorola radio 1973 Demonstrated prototype of the DynaTAC portable cellular system World s first commercial handheld cellular phone DynaTAC 8000X weighed 28-ounces (794 grams) World s first HDTV technical standard World s first GSM cellular system World s first dual-mode cellular phone The 3.1 ounce (88 grams) StarTac wearable cellular phone is the world s smallest and lightest World's first handset, iden i1000plus, to combine a digital phone, two-way radio, Internet microbrowser, , fax and twoway messaging World s first general packet radio service (GPRS) wireless phone for always on Internet access World s first wireless cable modem gateway introduced Iconic RAZR V3 wireless phone introduced MING smart phone recognizes 10,000+ handwritten characters from Chinese alphabet Motorola DROID #1 on Time s Top Ten of 2009 Launch MotoX, Moto G Fast upgrades Moto E Moto 360 FROM THE BIG BANG TO ECOMMERCE, 63

64 to the New Economy Mobile was a revolution, but Mobile is an outdated concept. Clouds (Internet, Connected World, World s Information, ) will be available everywhere: phones, watches, glasses, cars, appliance, microchip implant,... FROM THE BIG BANG TO ECOMMERCE, 64

65 Motorola Cloud Customers to Ecosystem the New Economy Motorola Cloud Product Marketing Web Engineering Finance Consumers: Phones, Wearables & Companion Products Moto Maker Sales Business Operation Customer Support Internal Business Teams Partners & Carriers FROM THE BIG BANG TO ECOMMERCE, 65

66 Motorola Cloud Applications to & Services the New Economy On-Device Applications & Services Web Applications & Software as a Service Platform as a Service Infrastructure as a Service FROM THE BIG BANG TO ECOMMERCE, 66

The new Backend as a Service to rule all your apps. http://start.cat hello@start.cat Backend as 2 CORUS The new Service Backend as a Service (BaaS) is a model for providing web and mobile developers with