Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance

Similar documents

Introduction to Scheduling Theory

Applied Algorithm Design Lecture 5

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Part 2: Community Detection

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Random graphs with a given degree sequence

Social Media Mining. Network Measures

DATA ANALYSIS II. Matrix Algorithms

On the independence number of graphs with maximum degree 3

Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University

Data Mining: Algorithms and Applications Matrix Math Review

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

On the effect of forwarding table size on SDN network utilization

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Least Squares Estimation

Private Approximation of Clustering and Vertex Cover

B490 Mining the Big Data. 2 Clustering

Cluster Analysis: Advanced Concepts

Scheduling Shop Scheduling. Tim Nieberg

Protein Protein Interaction Networks

Exponential time algorithms for graph coloring

Markov random fields and Gibbs measures

Fairness in Routing and Load Balancing

Approximation Algorithms

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Small Maximal Independent Sets and Faster Exact Graph Coloring

Social Media Mining. Data Mining Essentials

School Timetabling in Theory and Practice

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Improved Algorithms for Data Migration

How to Design the Shortest Path Between Two Keyword Algorithms

Distributed Computing over Communication Networks: Maximal Independent Set

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

THE SCHEDULING OF MAINTENANCE SERVICE

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )

Local outlier detection in data forensics: data mining approach to flag unusual schools

An Introduction to Basic Statistics and Probability

11. APPROXIMATION ALGORITHMS

5.1 Bipartite Matching

Practical Graph Mining with R. 5. Link Analysis

Triangle deletion. Ernie Croot. February 3, 2010

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Unsupervised learning: Clustering

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Nonparametric adaptive age replacement with a one-cycle criterion

Misunderstandings between experimentalists and observationalists about causal inference

The positive minimum degree game on sparse graphs

An Introduction to Machine Learning

1 Approximating Set Cover

Guessing Game: NP-Complete?

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Single machine parallel batch scheduling with unbounded capacity

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Econometrics Simple Linear Regression

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

Key Concept. Density Curve

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

STAT 830 Convergence in Distribution

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Basics of Statistical Machine Learning

Definition Given a graph G on n vertices, we define the following quantities:

Statistical Machine Learning

TU e. Advanced Algorithms: experimentation project. The problem: load balancing with bounded look-ahead. Input: integer m 2: number of machines

1 The Line vs Point Test

Large induced subgraphs with all degrees odd

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

A simpler and better derandomization of an approximation algorithm for Single Source Rent-or-Buy

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

Finding and counting given length cycles

Mining Social Network Graphs

Visualization of textual data: unfolding the Kohonen maps.

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Mining Social-Network Graphs

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

How To Solve The Cluster Algorithm

CMPSCI611: Approximating MAX-CUT Lecture 20

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

Transcription:

Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance Michael J. Higgins Jasjeet Sekhon April 12, 2014 EGAP XI

A New Blocking Method A new blocking method with nice theoretical properties Blocking: create strata and then randomize within strata Review some results on blocking, post-stratification, and model adjustment of experiments Some analytical benefits for blocking, but the main one is transparency and minimizing fishing

A New Blocking Method The method minimizes the Maximum Within-Block Distance Ensures good covariate balance by design It is fast: polynomial time Works for any number of treatments and any minimum number of observations per block Better balance than greedy methods Can estimate conditional variances

Covariate imbalance in randomized experiments PROBLEM: In finite samples, there is a probability of bad covariate balance between treatment groups Bad imbalance on important covariates: Imprecise estimates of treatment effects Conditional bias

Some theoretical results about blocking Blocking cannot hurt the precision of the estimator: if no worse than random matching if sample from an infinite super population Post-Stratification and regression adjustment can decrease the precision of the estimator Blocking may increase the estimated variance. But this is specific to the estimator used (degrees of freedom). e.g., randomization inference solves the problem.

Adjustment and covariate imbalance Regression adjustment [Freedman, 2008, Lin, 2012] Post-stratification [Miratrix, Sekhon, and Yu, 2013]: Group similar units together after after randomization SATE/PATE results good; ex post problems arise Data mining concerns Re-randomization [Morgan and Rubin, 2012]: Repeat randomly assigning treatments until covariate balance is acceptable LESSON: design the randomization to build in adjustment

Some Current blocking approaches Matched-pairs blocking: Pair most-similar units together. For each pair, randomly assign one unit to treatment, one to control [Imai, 2008] Optimal Multivariate Matching Before Randomization [Greevy, Lu, Silber, and Rosenbaum, 2004] Optimal-greedy blocking [Moore, 2012]

Matched-Pairs No efficient way to extend approach to more than two treatment categories Fixed block sizes (2 units): design may pair units from different clusters Cannot estimate conditional variances [Imbens, 2011] Difficulty with treatment effect heterogeneity Worse problems with some tests e.g., rank sum

Blocking by minimizing the Maximum Within-Block Distance (MWBD) Experiment with n units and r treatment categories Select a threshold t r for a minimum number of units to be contained in a block Block units so that each block contains at least t units, and so that the maximum distance between any two units within a block the MWBD is minimized Threshold t : Allows designs with multiple treatment categories, multiple replications of treatments within a block

A simple example: Threshold t = 2. Distance = Mahalanobis distance. Heights and Weights for 12 Subjects Heights 59 60 61 62 63 64 120 130 140 150 160 170 Weights

A simple example: Threshold t = 2. Distance = Mahalanobis distance. Matched Pair Blocking Heights 59 60 61 62 63 64 120 130 140 150 160 170 Weights

A simple example: Threshold t = 2. Distance = Mahalanobis distance. Minimize Maximum Distance Heights 59 60 61 62 63 64 120 130 140 150 160 170 Weights

A simple example: Threshold t = 2. Distance = Mahalanobis distance. Matched Pair Blocking Minimize Maximum Distance Heights 59 60 61 62 63 64 Maximum Within Block Distance Heights 59 60 61 62 63 64 Maximum Within Block Distance 120 130 140 150 160 170 Weights 120 130 140 150 160 170 Weights

and approximately optimal blocking For all blockings that contain at least t units: Let λ denote the smallest MWBD achievable by such a blocking any blocking that meets this bound is called an optimal blocking Finding an optimal blocking is an NP-hard problem feasible to find in small experiments, may not be feasible in large experiments [Hochbaum and Shmoys, 1986] We now show that finding a blocking with MWBD 4λ is possible in polynomial time

Viewing experimental units as a graph Use an idea from Paul Rosenbaum [1989]: Matching problems can be viewed as graph theory partitioning problems Experimental units are vertices in a graph An edge is drawn between two units if they can be placed in the same block Edge-weights are some measure of distance between pretreatment covariates (e.g. Mahalanobis distance)

Viewing experimental units as a graph: In pictures Distance = Mahalanobis distance. Heights and Weights for 12 Subjects Heights 59 60 61 62 63 64 120 130 140 150 160 170 Weights

Viewing experimental units as a graph: In pictures Distance = Mahalanobis distance. Units as a graph

Viewing experimental units as a graph: In pictures Distance = Mahalanobis distance. Units as a graph

Viewing experimental units as a graph: In pictures Distance = Mahalanobis distance. Units as a graph 5.24 4.86 7.08 0.2

Notation: A graph G is defined by its vertex set V and its edge set E: G = (V, E) Vertices in V denoted by {i}; n units n vertices in V Edges in E are denoted by (i, j) A complete graph has an edge (i, j) E between every two distinct vertices {i}, {j} V ; n(n 1) 2 edges overall The weight of edge (i, j) E is denoted by w ij : at most n(n 1) 2 distinct values of w ij

Note about edge weights If our concern is covariate balance, natural choices for edge weights measure distance between block covariates e.g., Mahalanobis, L 1, L 2, distances Our method only requires weights to satisfy the triangle inequality: for any distinct vertices {i}, {j}, {k}, w ik + w kj w ij

A simple example: Threshold t = 2. Distance = Mahalanobis distance. Minimize Maximum Distance Heights 59 60 61 62 63 64 Maximum Within Block Distance 120 130 140 150 160 170 Weights

as a graph partitioning problem A partition of V is a division of V into disjoint blocks of vertices (V 1, V 2,..., V l ) Blocking of units Partition of a graph: Two units are in the same block of the blocking if and only if their corresponding units are in the same block of the partition problems are optimal partitioning problems: we want to find a partition (V1, V 2,..., V l ) with V j t that minimizes the maximum within-block edge weight

Bottleneck subgraphs Bottleneck subgraphs helpful for solving partitioning problems Define the bottleneck subgraph for maximum weight of w as the graph that has (i, j) E w if and only if w ij w At most n(n 1) 2 different edge weights w ij At most n(n 1) 2 different bottleneck subgraphs All points within a block of our approximately optimal blocking are connected by some path of edges in a bottleneck subgraph; used to show approximate optimality

Bottleneck subgraph: In pictures Complete graph Units as a graph 5.24 4.86 7.08 0.2

Bottleneck subgraph: In pictures Bottleneck subgraph of weight 5 Bottleneck graph: Weight = 5

Bottleneck subgraph: In pictures Bottleneck subgraph of weight 3 Bottleneck graph: Weight = 3

Bottleneck subgraph: In pictures Bottleneck subgraph of weight 3 Bottleneck graph: Weight = 3 A Path of 3 edges from Point A to Point B B

Approximate algorithm outline: Find the bottleneck subgraph of appropriate weight can use k-nearest neighbor graph Select block centers that are just far enough apart Grow from these block centers to obtain an approximately optimal partition and thus, an approximately optimal blocking Approach closely follows Hochbaum and Shmoys [1986]

Algorithm step-by-step: Find bottleneck graph Find smallest weight threshold λ such that each vertex in the corresponding bottleneck subgraph is connected to at least t 1 edges. Can show that λ λ, where λ is the smallest MWBD possible. Bottleneck subgraph can be constructed in polynomial time. t = 2 Bottleneck graph: Weight = 0.24

Algorithm step-by-step: Find block centers Find a set of vertices block centers such that: There is no path of two edges or less connecting any of the vertices in the set. For any vertex not in the set, there is a path of two edges or less that connects that vertex to one in the set. Any set will do, but some choices of centers are better. Bottleneck graph: Weight = 0.24

Algorithm step-by-step: Grow from block centers Form blocks comprised of a block center plus any vertices connected to that center by a single edge. The way our block centers were chosen (no path of two edges connects two block centers), these blocks will not overlap. At this point, these blocks contain at least t units (by edge connection criterion). Bottleneck graph: Weight = 0.24

Algorithm step-by-step: Assign all unassigned vertices For each unassigned vertex, find the closest block center. Add that vertex to the center s corresponding block. The way our block centers were chosen, all unassigned vertices are at most a path of two edges away from a block center. Bottleneck graph: Weight = 0.24

Our blocking Our approximate algorithm came up with the following blocking: Approximately optimal blocking

A simple example: Threshold t = 2. Dissimilarity = Mahalanobis distance. Minimize Maximum Distance Heights 59 60 61 62 63 64 Maximum Within Block Distance 120 130 140 150 160 170 Weights

Sketch of proof of approximate optimality Algorithm is guaranteed to obtain a blocking with MWBD 4λ, though does much better than that in practice.

Sketch of proof of approximate optimality Algorithm is guaranteed to obtain a blocking with MWBD 4λ, though does much better than that in practice. Sketch of proof: Each vertex is at most a path of two edges away from a block center = In the worst case: two vertices {i}, {j} in the same block can be connected by a path of four edges in the bottleneck subgraph (two from vertex {i} to the block center, two from the block center to vertex {j}).

Sketch of proof cont d Each vertex is at most a path of two edges away from a block center = In the worst case: two vertices {i}, {j} in the same block can be connected by a path of four edges in the bottleneck subgraph (two from vertex {i} to the block center, two from the block center to vertex {j}). In worst case: (i, k 1 ), (k 1, k 2 ), (k 2, k 3 ), (k 3, j) is a path of four edges connecting {i} to {j}. Each edge has weight at most λ = The corresponding edge weights satisfy: w ik1 + w k1 k 2 + w k2 k 3 + w k3 j 4λ 4λ.

Sketch of proof cont d Since edge weights satisfy the triangle inequality: w ik + w kj w ij it follows that w ij w ik1 + w k1 k 2 + w k2 k 3 + w k3 j 4λ 4λ.

Sketch of proof cont d Since edge weights satisfy the triangle inequality: it follows that w ik + w kj w ij w ij w ik1 + w k1 k 2 + w k2 k 3 + w k3 j 4λ 4λ. That is, every edge joining two vertices within the same block has weight 4λ. The maximum within-block distance of the approximately optimal blocking is 4λ. QED

Some final remarks about algorithm: Algorithm does not contain any inherently random components. Quick, local changes to the approximately optimal blocking may improve the blocking. (e.g., divide large blocks into smaller blocks, swap units between blocks)

Neyman-Rubin potential outcomes model The Neyman-Rubin potential outcomes framework assumes the following model for response [Splawa-Neyman, Dabrowska, and Speed, 1990, Rubin, 1974]: Y kc = y kc1 T kc1 + y kc2 T kc2 +... + y kcr T kcr. Y kc : Observed response of kth unit in block c. y kct : Potential outcome of the unit under treatment t. T kct : Treatment indicators. T kct = 1 if the unit receives treatment t, T kct = 0 otherwise.

Parameters of interest and estimators Parameters of interest: Sample average treatment effect of treatment s relative to treatment t (SATE st ): SATE st = b n c c=1 k=1 y kcs y kct n Two unbiased estimators of SATE st are the difference-in-means estimator and the the Horvitz-Thompson estimator. ˆδ st,diff ˆδ st,ht b c=1 b c=1 n c n n c n n c k=1 n c k=1 ( ykcs T kcs y ) kctt kct, #T cs #T ct ( ykcs T kcs n c /r y ) kctt kct. n c /r Assume complete randomization of treatment, r divides n c.

Variance of estimators = Var(ˆδ st,diff ) = Var(ˆδ st,ht ) b nc 2 ( r 1 n 2 n c 1 (σ2 cs + σct) 2 + 2 γ ) cst n c 1 c=1 µ cs = 1 n c n c k=1 y kcs σcs 2 = 1 n c (y kcs µ cs ) 2 n c k=1 γ cst = 1 n c (y kcs µ cs )(y kct µ ct ) n c k=1

Variance of estimators = Var(ˆδ st,diff ) = Var(ˆδ st,ht ) b nc 2 ( r 1 n 2 n c 1 (σ2 cs + σct) 2 + 2 γ ) cst n c 1 c=1 Note: σ 2 cs and σ 2 ct are estimable, γ cst not directly estimable. Conservative estimate: Var = b c=1 nc 2 ( ) 2(r 1) n 2 n c 1 (ˆσ2 cs + ˆσ ct) 2 Small differences for more general treatment assignments.

When does blocking help? Blocking vs. completely randomized treatment assignment (no blocking): which estimates of SATE st have lower variance? Blocking helps if and only if: b c=1 n 2 c [( (r 1)(σ 2 s + σt 2 ) ) + 2γ st n 2 c (n 1) ( (r 1)(σ 2 cs + σct) 2 )] + 2γ cst n 2 (n c 1) 0 Intuitive to make σ 2 cs, σ 2 ct small w.r.t. σ 2 s, σ 2 t, but other blocking designs may also improve treatment effect estimates.

Can blocking hurt? When blocking is completely randomized: [ b ( (r 1)(σ E nc 2 2 cs + σ 2 ) ] ct) + 2γ cst n 2 (n c=1 c 1) b ( (r 1)(σ = nc 2 2 s + σt 2 ) ) + 2γ st n 2 c (n 1) c=1 Blocked variance = Completely randomized variance Any improvement to completely random blocking Reduced variance in treatment effect estimates.

Future Work Apply graph partitioning techniques to other statistical problems: Clustering alternative to k-means. Apply to matching problems. Other problems in nonparametric statistics. Improve theoretic results of algorithm.

Bibliography I David A. Freedman. On regression adjustments in experiments with several treatments. The annals of applied statistics, 2(1): 176 196, 2008. Robert Greevy, Bo Lu, Jeffrey H. Silber, and Paul Rosenbaum. Optimal multivariate matching before randomization. Biostatistics, 5(4):263 275, 2004. D.S. Hochbaum and D.B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM (JACM), 33(3):533 550, 1986. K. Imai. Variance identification and efficiency analysis in randomized experiments under the matched-pair design. Statistics in medicine, 27(24):4857 4873, 2008.

Bibliography II Guido W. Imbens. Experimental design for unit and cluster randomized trials. Working Paper, 2011. Winston Lin. Agnostic notes on regression adjustments to experimental data: Reexamining freedman s critique. Annals of Applied Statistics, 2012. Luke W. Miratrix, Jasjeet S. Sekhon, and Bin Yu. Adjusting treatment effect estimates by post-stratification in randomized experiments. Journal of the Royal Statistical Society, Series B, 75(2):369 396, 2013. Ryan T Moore. Multivariate continuous blocking to improve political science experiments. Political Analysis, 20(4):460 479, 2012.

Bibliography III Kari Lock Morgan and Donald B Rubin. Rerandomization to improve covariate balance in experiments. Annals of Statistics, 40(2):1263 1282, 2012. P.R. Rosenbaum. Optimal matching for observational studies. Journal of the American Statistical Association, 84(408): 1024 1032, 1989. Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology; Journal of Educational Psychology, 66(5):688, 1974. Jerzy Splawa-Neyman, DM Dabrowska, and TP Speed. On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science, 5(4):465 472, 1990.