Personalizing Treatments using Machine Learning
Sunday, Jan. 3, 2021 10:00 AM - 12:00 PM (EST)
- Chair: Gregory Lewis, Microsoft Research
Dynamically Optimal Treatment Allocation Using Reinforcement Learning
AbstractDevising guidance on how to assign individuals to treatment is an important goal of empirical research. In practice individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of an year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal tradeoffs, previous work on devising optimal policy rules in a static context is either not applicable, or is sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes ex-ante expected welfare in this dynamic context. We allow the class of policy rules to be restricted for computational, legal or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule by casting the evolution of the value function under each policy in a Partial Differential Equation (PDE) form and using the theory of viscosity solutions to PDEs. We find that the policy regret decays at a n−1/2 rate in most examples; this is the same rate as in the static case.
Personalizing Treatments For Habit Formation: Learning Optimal Treatment Rules From a Multi-Arm Experiment
AbstractAcross social science and health policy settings, there has been a strong interest in heterogeneity in treatment effects - identifying subgroups of a population for whom which particular intervention is most effective among a candidate set of interventions. In this paper, we learn the optimal treatment assignment rule in an experimental setting with a large number of discrete treatment arms. We propose an 'honest' recursive partitioning tree and forest-based approach in a multiple discrete treatment arm setting to learn and validate the individualized assignment rules. We apply this method using data from a mega randomized control trial conducted in collaboration with a national gym chain, with over 50 behavioral interventions promoting the formation of lasting exercise habits. We compare our approach to regression-based Q-learning and classification-based sequential outcome weighted learning approaches.
- C1 - Econometric and Statistical Methods and Methodology: General
- C5 - Econometric Modeling