« Back to Results

Reinforcement Learning in Economics and Econometrics

Paper Session

Monday, Jan. 4, 2021 10:00 AM - 12:00 PM (EST)

Hosted By: Econometric Society
  • Chair: Aureo de Paula, University College London

Temporal-Difference Estimation of Dynamic Discrete Choice Models

Karun Adusumilli
,
University of Pennsylvania
Dita Eckardt
,
Institute for Fiscal Studies

Abstract

We propose a new algorithm to estimate the structural parameters in dynamic discrete choice models. The algorithm is based on the conditional choice probability approach, but uses the idea of Temporal-Difference learning from the Reinforcement Learning literature to estimate the different terms in the value functions. In estimating these terms with functional approximations using basis functions, our approach has the advantage of naturally allowing for continuous state spaces. Furthermore, it does not require specification of transition probabilities, and even estimation of choice probabilities can be avoided using a recursive procedure. Computationally, our algorithm only requires solving a low dimensional linear equation. For the estimation of dynamic games, our procedure does not require integrating over the actions of other players, which further heightens the computational advantage. We show that our estimator is consistent, and efficient under discrete state spaces. In settings with continuous states, we propose easy to implement locally robust corrections in order to achieve parametric rates of convergence. Preliminary Monte Carlo simulations confirm the workings of our algorithm.

Hiring as Exploration

Danielle Li
,
Massachusetts Institute of Technology
Lindsey Raymond
,
Massachusetts Institute of Technology
Peter Bergman
,
Columbia University

Abstract

This paper views hiring as a contextual bandit problem: to find the best workers over time, firms must balance "exploitation" (selecting from groups with proven track records) with "exploration" (selecting from under-represented groups to learn about quality). Yet modern hiring algorithms, based on "supervised learning" approaches, are designed solely for exploitation. Instead, we build a resume screening algorithm that values exploration by evaluating candidates according to their statistical upside potential. Using data from professional services recruiting within a Fortune 500 firm, we show that this approach improves the quality (as measured by eventual hiring rates) of candidates selected for an interview, while also increasing demographic diversity, relative to the firm's existing practices. The same is not true for traditional supervised learning based algorithms, which improve hiring rates but select far fewer Black and Hispanic applicants. In an extension, we show that exploration-based algorithms are also able to learn more effectively about simulated changes in applicant hiring potential over time. Together, our results highlight the importance of incorporating exploration in developing decision-making algorithms that are potentially both more efficient and equitable.

A Computational Framework for Analyzing Dynamic Procurement Auctions: The Market Impact of Information Sharing

Ariel Pakes
,
Harvard University
John Asker
,
University of California-Los Angeles
Chaim Fershtman
,
Tel Aviv University
Jihye Jeon
,
Boston University

Abstract

This paper develops a computational framework to analyze dynamic auctions and uses it to investigate the impact of information sharing among bidders. We show that allowing for the dynamics implicit in many auction environments enables the emergence of equilibrium states that can only be reached when firms are responding to dynamic incentives. The impact of information sharing depends on the extent of dynamics and provides support for the claim that information sharing, even of strategically important data, need not be welfare reducing. Our methodological contribution is to show how to adapt the experience-based equilibrium concept to a dynamic auction environment and to provide an implementable boundary-consistency condition that mitigates the extent of multiple equilibria.

Adaptive Treatment Assignment in Experiments For Policy Choice

Maximilian Kasy
,
Oxford University
Anja Sautmann
,
World Bank

Abstract

Standard experimental designs are geared toward point estimation and hypothesis testing, while bandit algorithms are geared toward in-sample outcomes. Here, we instead consider treatment assignment in an experiment with several waves for choosing the best among a set of possible policies (treatments) at the end of the experiment. We propose a computationally tractable assignment algorithm that we call “exploration sampling,” where assignment probabilities in each wave are an increasing concave function of the posterior probabilities that each treatment is optimal. We prove an asymptotic optimality result for this algorithm and demonstrate improvements in welfare in calibrated simulations over both non-adaptive designs and bandit algorithms. An application to selecting between six different recruitment strategies for an agricultural extension service in India demonstrates practical feasibility.
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General
  • C9 - Design of Experiments