Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE. Ed Snelson

Transcription

1 Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE Ed Snelson

2 Work by Léon Bottou Jonas Peters Denis Xavier Charles Elon Portugaly Patrice Simard Joaquin Quiñonero Candela D. Max Chickering Dipankar Ray Ed Snelson

3 I. MOTIVATION

4 Search ads

5 The search ads ecosystem We run a marketplace with different players: Player: User Is looking for something Selects search engine (publisher) Issues a query Clicks or not on organic result(s) or ad(s) Is happy / not with relevance, adjusts attitude towards publisher accordingly Goal: few ads, high relevance

6 The search ads ecosystem We run a marketplace with different players: Player: Advertiser Is looking for customers to sell to Defines campaigns, ads and places bids for clicks on these ads in different ad markets Monitors performance of campaigns Adjusts campaigns / bids accordingly Possibly uses rational strategy to bid Goal: maximize total value total cost

7 The search ads ecosystem We run a marketplace with different players: Player: Publisher Has search engine Contracts marketplace to provide ads Goal: short term and long term revenue, search engine reputation and market share

8 Complex system with feedback User Publisher Advertiser USER FEEDBACK LOOP Queries Ads & Bids ADVERTISER FEEDBACK LOOP Ads Prices LEARNING FEEDBACK LOOP Clicks (and consequences) Learning

9 Learning to run a marketplace Goal: improve marketplace machinery such that its long term revenue is maximal Approximate goal by improving multiple performance measures (KPIs) related to all players Provide data for decision making The learning Automatically machine optimize parts of the is not system a machine but is an organization with lots of people doing stuff! How can we help?

10 Outline from here on II. Online Experimentation III. Counterfactual measurements IV. Cosmos/SCOPE V. Implementation details

11 II. ONLINE EXPERIMENTATION

12 How do parameters affect KPIs? We want to determine how certain auction parameters affect KPIs Three options: 1. Offline log analysis correlational 2. Auction simulation 3. Online experimentation causal

13 The problem with correlation analysis (Simpson s paradox) Trying to decide whether a drug helps or not Historical data: Conclusion: don t give the drug But what if the Drs. were saving the drug for the severe cases? Severe cases (treatment rate 80%) All Survived Died Survival Rate Treated 4,000 1,200 2,800 30% Not Treated 1, % Conclusion reversed: drug helps for both severe and mild cases All Survived Died Survival Rate Treated 5,000 2,100 2,900 42% Not Treated 5,000 2,900 2,100 58% Mild case (treatment rate 20%) All Survived Died Survival Rate Treated 1, % Not Treated 4,000 2,800 1,300 70%

14 Overkill? Pervasive causation paradoxes in ad data! Example. Logged data shows a positive correlation between event A First mainline ad gets a high quality score and event B Second mainline ad receives a click. Do high quality ads encourage clicking below? Controlling for event C Query categorized as commercial reverses the correlation for both commercial and non-commercial queries.

15 Randomized experiments Randomly select who to treat All population (treatment rate 30%) All Survived Died Survival Rate Treated 3,000 1,800 1,200 60% Not Treated 7,000 2,800 4,200 40% Selection independent of all confounding factors Therefore eliminates Simpson s paradox and allows: Counterfactual estimates If we had given penicillin to x% of the patients, the success rate would have been 60% x + 40% 1 x

16 Experiments in the online world A/B tests are used throughout the online world to compare different versions of the system A random fraction of the traffic (a flight) uses clickprediction system A Another random fraction uses click-prediction system B Wait for a week, measure KPIs, choose best! Our framework takes this one step further

17 III. COUNTERFACTUAL MEASUREMENTS

18 Counterfactuals Measuring something that did not happen How would the system have performed if, when the data was collected, we had used system instead of system?

19 Classification example Replaying past data Collect labeled data in existing setup Replay the past data to evaluate what the performance would have been if we had used classifier θ. * s Requires knowledge of all functions connecting the point of change to the point of measurement.

20 Concrete example: mainline reserve (MLR) Mainline Sidebar Ad Score > MLR

21 Online randomization Q: Can we estimate the results of a change counterfactually (without actually performing the change)? A: Yes, if system and system are non-deterministic (and close enough) Deterministic Randomized MLR MLR P(MLR) P (MLR) MLR MLR For each auction, a random MLR is used online, drawn from the data-collection distribution P(MLR)

22 Estimating counterfactual KPIs Usual additive KPI: Clicks total = Counterfactual KPI: i Clicks(auction i ) Weighted sum: auctions with MLRs closer to the counterfactual distribution get higher weight Clicks total ~ i w i Clicks(auction i ) w i = P (MLR i ) P MLR i

23 Exploration P(ω) P(ω) P (ω) P (ω) Quality of the estimation Confidence intervals reveal whether the data collection distribution P ω performs sufficient exploration to answer the counterfactual question of interest.

24 Two-parts confidence interval Outer confidence interval When this is too large, we must sample more. Inner confidence interval When this is too large, we must explore more.

25 Playing with mainline reserves Mainline reserves (MLRs) Thresholds that control whether ads are displayed in the mainline Randomized experiment Random log-normal multiplier applied to MLRs. Control experiment Same setup with 18% lower mainline reserves Same setup without randomization

26 Playing with mainline reserves (ii) Control with 18% lower MLR Outer interval Inner interval Control with no randomization

27 Playing with mainline reserves (iii) This is easy to estimate

28 Playing with mainline reserves (iv) Revenue has always high sample variance

29 More with the same data How is this related to A/B testing? A/B testing tests 2 specific settings against each other Need to know what questions you want to ask beforehand! Big advantage of more general randomization: Collect data first, choose question(s) later Randomizing more stuff increases opportunities But Requires more sophisticated offline log processing

30 IV. COSMOS/SCOPE

31 Ad Auction Logs 10TB per day ad-auction logs Cooked and joined from various raw logs Stored in Cosmos, queried via SCOPE Small fraction of total Bing logs and jobs: Tens of thousands SCOPE jobs daily Tens of PBs read/write daily

32 Cosmos/SCOPE PIG/HIVE HDFS

33 Cosmos Microsoft s internal distributed data store Tens of thousands of commodity servers HDFS, GFS Append-only file system, optimized for sequential I/O Data replication and compression

34 Data Representation 1. Unstructured streams Custom Extractors: converts a sequence of bytes into a RowSet, specifying a schema for the columns 2. Structured streams Data stored alongside metadata information: a welldefined schema, and structural properties (e.g. partitioning and sorting information) Can be horizontally partitioned into tens of thousands of partitions e.g. hash or range partitioning Indexes for random access and index-based joins

35 SCOPE scripting language SQL-like (in syntax) declarative language specifying data transformation pipeline Each scope statement takes as input one or more RowSets, and outputs another RowSet Highly extensible with C# expressions, custom operators and data types Scope compiler and optimizer responsible for generating a data flow DAG for an efficient parallel execution

36 Inputs/Outputs Unstructured A = EXTRACT a:int, b:float FROM "log.txt" USING CSVExtractor(); B = SELECT a, SUM(b) AS SB FROM A GROUP BY a; OUTPUT B TO "log2.xml" USING XMLOutputter(); Structured SSTREAM <stream_name>; OUTPUT [<named_rowset>] TO SSTREAM <stream_name> [ [HASH RANGE] CLUSTERED BY <cols> [INTO <number>] [SORTED BY <cols>] ];

37 C# Expressions and functions C# String expression R1 = SELECT A+C AS ac, B.Trim() AS B1 FROM R WHERE StringOccurs(C, xyz ) > 2; C# String method #CS public static int StringOccurs(string str, string ptrn) { int cnt=0; int pos=-1; while (pos+1 < str.length) { pos = str.indexof(ptrn, pos+1); if (pos < 0) break; cnt++; } return cnt; } #ENDCS

38 C# User-defined types (UDTs) Arbitrary C# classes can be used as column types in scripts Extremely convenient for easy serialization/deserialization Can be referenced in external dlls, C# backing files, and in-script (#CS #ENDCS) SELECT UserId, SessionId, new RequestInfo(binaryData) AS Request FROM InputStream WHERE Request.Browser.IsIE();

39 C# User-defined operators User defined aggregates Aggregate Interface: Intialize, Accumulate, Finalize Can be declared recursive: allows partial aggregation MapReduce-like extensions PROCESS REDUCE Can be declared recursive COMBINE

40 Example Processor PROCESS Input USING TrimProcessor(args);

41 Example Reducer REDUCE Input USING CountProcessor(args) ON groupingcolumn;

42 SCOPE compilation and execution SELECT query, COUNT() AS count FROM "search.log USING LogExtractor GROUP BY query HAVING count > 1000 ORDER BY count DESC; OUTPUT TO "qcount.result"; Runtime cost-based optimizer

43 Runtime optimizations: examples Dynamic Aggregation Dynamic Broadcast

44 SCOPE: inspiration from parallel DBs and MapReduce systems

45 SCOPE: Pros/Cons (an opinion) Pros: Very quick to write simple queries without thinking about parallelization and execution Highly extensible with deep C# integration UDT columns and C# functions Easy development and debugging from VS Intellisense Cons: No loop/iteration support means a poor fit for many ML algorithms Batch, rather than interactive

46 V. IMPLEMENTATION

47 Counterfactual computation KPI total = w i KPI(auction i ) i Ideal for Map-Reduce setting Map: auction i KPI(auction i ) Reduce: i w i

48 Counterfactual grid

49 SCOPE pseudo-code for counterfactuals AuctionLogs = VIEW CosmosLogPath; SELECT FROM Auction AuctionLogs; C# UDT: Wraps all logged info about a single auction C# UDFs SELECT ComputeKPIs(Auction) AS KPIs, ComputeWeightGrid(Auction) AS WeightGrid; Unroll the weight grid Recursive Aggregator: w i, w i KPI i etc. SELECT ComputeWeightedKPIs(KPIs, GridPoint) AS wkpis, CROSS APPLY WeightGrid AS GridPoint; SELECT AggregateKPIs(wKPIs) AS TotalKPIs GROUP BY GridPoint; SELECT GridPoint, TotalKPIs.Finalize() AS FinalKPIs OUTPUT TO Results.tsv ; Call instance method on TotalKPIs UDT

50 Conclusions There are systems in the real world that are too complex to easily formalize Causal inference clarifies many problems Ignoring causality => Simpson s paradox Randomness allows inferring causality The counterfactual framework is modular Randomize in advance, ask later Counterfactual analysis ideally suited to batch map-reduce