« Back to Results

Personalizing Treatments using Machine Learning

Paper Session

Sunday, Jan. 3, 2021 10:00 AM - 12:00 PM (EST)

Hosted By: American Economic Association

Chair: Gregory Lewis, Microsoft Research

Targeting for Long-term outcomes

Jeremy Yang

Massachusetts Institute of Technology

Dean Eckles

Massachusetts Institute of Technology

Paramveer Dhillon

University of Michigan

Sinan Aral

Massachusetts Institute of Technology

Abstract

Decision-makers often want to target interventions (e.g., marketing campaigns) so as to maximize an outcome that is observed only in the long-term. This typically requires delaying decisions until the outcome is observed or relying on simple short-term proxies for the long- term outcome. Here we build on the statistical surrogacy and off-policy learning literature to impute the missing long-term outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doubly-robust approach. We apply our approach in large-scale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers to maximize their long-term revenue. We first show that conditions for validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization. We then validate this approach empirically by comparing it with a policy learned on the ground truth long-term outcomes and show that they are statisti- cally indistinguishable. Our approach also outperforms a policy learned on short-term proxies for the long-term outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for each new cohort of customers to account for potential non-stationarity. Over three years, our approach had a net-positive revenue impact in the range of $4-5 million compared to The Boston Globe’s current policies.

Dynamically Optimal Treatment Allocation Using Reinforcement Learning

Karun Adusumilli

University of Pennsylvania

Friedrich Geiecke

London School of Economics

Claudio Schilter

University of Zurich

View Abstract

Abstract

Devising guidance on how to assign individuals to treatment is an important goal of empirical research. In practice individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of an year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal tradeoffs, previous work on devising optimal policy rules in a static context is either not applicable, or is sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes ex-ante expected welfare in this dynamic context. We allow the class of policy rules to be restricted for computational, legal or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule by casting the evolution of the value function under each policy in a Partial Differential Equation (PDE) form and using the theory of viscosity solutions to PDEs. We find that the policy regret decays at a n−1/2 rate in most examples; this is the same rate as in the static case.

Personalizing Treatments For Habit Formation: Learning Optimal Treatment Rules From a Multi-Arm Experiment

Rahul Ladhania

University of Pennsylvania; University of Michigan (Effective July 2020)

Jann Spiess

Stanford University

Katherine Milkman

University of Pennsylvania

Sendhil Mullainathan

University of Chicago

Lyle Ungar

University of Pennsylvania

View Abstract

Abstract

Across social science and health policy settings, there has been a strong interest in heterogeneity in treatment effects - identifying subgroups of a population for whom which particular intervention is most effective among a candidate set of interventions. In this paper, we learn the optimal treatment assignment rule in an experimental setting with a large number of discrete treatment arms. We propose an 'honest' recursive partitioning tree and forest-based approach in a multiple discrete treatment arm setting to learn and validate the individualized assignment rules. We apply this method using data from a mega randomized control trial conducted in collaboration with a national gym chain, with over 50 behavioral interventions promoting the formation of lasting exercise habits. We compare our approach to regression-based Q-learning and classification-based sequential outcome weighted learning approaches.

Discussant(s)

Jeffrey McCullough

University of Michigan

Jann Spiess

Stanford University

Mert Demirer

Massachusetts Institute of Technology

JEL Classifications

C1 - Econometric and Statistical Methods and Methodology: General
C5 - Econometric Modeling

This website uses cookies.

Personalizing Treatments using Machine Learning

Sunday, Jan. 3, 2021 10:00 AM - 12:00 PM (EST)

Targeting for Long-term outcomes

Abstract

Dynamically Optimal Treatment Allocation Using Reinforcement Learning

Abstract

Personalizing Treatments For Habit Formation: Learning Optimal Treatment Rules From a Multi-Arm Experiment

Abstract

Discussant(s)

JEL Classifications