« Back to Results

Machine Learning and High Dimensional Methods

Paper Session

Saturday, Jan. 4, 2020 2:30 PM - 4:30 PM (PST)

Marriott Marquis, Presidio 1 - 2
Hosted By: Econometric Society
  • Chair: Whitney Newey, Massachusetts Institute of Technology

Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments

Ying-Ying Lee
,
University of California-Irvine
Kyle Colangelo
,
University of California-Irvine

Abstract

We propose a nonparametric inference method for causal effect s of continuous treatment variables, under unconfoundedness and in the presence of high-dimensional or nonparametric nuisance parameters. Our simple kernel-based double debiased machine learning (DML) estimators for the average dose-response function (or the average structural function) and the partial e ffects are asymptotically normal with nonparametric convergence rates. The nuisance estimators for the conditional expectation function and the conditional density can be nonparametric kernel or series estimators or ML methods. Using doubly robust influence function and cross- fitting, we give tractable primitive conditions under which the nuisance estimators do not a ffect the fi rst-order large sample distribution of the DML estimators. We implement various ML methods in Monte Carlo simulations and an empirical application on a job training program evaluation to support the theoretical results and demonstrate the usefulness of our DML estimator in practice.

Structural Estimation of Dynamic Equilibrium Models with Unstructured Data

Jesus Fernandez-Villaverde
,
University of Pennsylvania
Stephen Hansen
,
Imperial College London

Abstract

In this paper, we show how the estimation of structural DSGE models with unstructured data can be accomplished by merging standard state-space techniques and a Latent Dirichlet allocation (LDA) in an augmented state-space representation. The posterior distribution of parameters from the resulting representation can be sampled through the use of Markov Chain Monte Carlo algorithms, and it is readily amenable to massive parallelization.

Demand Analysis with Many Prices

Victor Chernozhukov
,
Massachusetts Institute of Technology
Jerry Hausman
,
Massachusetts Institute of Technology
Whitney Newey
,
Massachusetts Institute of Technology

Abstract

From its inception, demand estimation has faced the problem of "many prices." While some aggregation across goods is always necessary, the problem of many prices remains even after aggregation. Economic theory shows that often the policy question of interest depends on only one, or a very few, price effects. For example, estimation of consumer surplus typically depends only on the own price effect since all other prices are held constant. Another common feature of data is that cross-price effects tend to be small. This paper uses Lasso to mitigate the curse of dimensionality in estimating the average expenditure share from cross-section data when cross-price effects are small. We estimate bounds on consumer surplus (BCS) using a novel double/debiased Lasso method. These bounds allow for multidimensional, nonseparable heterogeneity and solve the "zeros problem" of demand by including zeros in the estimation. We also use a control function to correct for endogeneity of total expenditure. As an additional contribution we use panel data to control for endogeneity of prices as well as expenditure. We average ridge regression individual slope estimators and bias correct for the regularization. We give inference theory when the number of time series observations is larger than the number of parameters, including primitive regularity conditions. We compare these methods in estimating the welfare effects of a tax on soda using scanner data. We find panel elasticities are substantially smaller than the cross section estimates, strongly suggesting that prices are endogenous.

Machine Learning for Dynamic Discrete Choice

Vira Semenova
,
Harvard University

Abstract

Dynamic discrete choice models often discretize the state vector and restrict its dimension in order to achieve valid inference. I propose a novel two-stage estimator for the set-identied the structural parameter that incorporates a high-dimensional state space into the dynamic model of imperfect competition. In the rst stage, I estimate the state variable's law of motion and the equilibrium policy function using machine learning tools. In the second stage, I plug the rststage estimates into a moment inequality and solve for the structural parameter. The moment function is presented as the sum of two components, where the rst one expresses the equilibrium assumption and the second one is a bias correction term that makes the sum insensitive (i.e., Neyman-orthogonal) to rst-stage bias. The proposed estimator uniformly converges at the root-N rate and I use it to construct condence regions. The results developed here can be used to incorporate high-dimensional state space into classic dynamic discrete choice models, for example, those considered in Rust (1987), Bajari et al. (2007), and Scott (2013).

How Is Machine Learning Useful for Macroeconomic Forecasting?

Philippe Goulet Coulombe
,
University of Pennsylvania
Maxime Leroux
,
University of Quebec-Montreal
Dalibor Stevanovic
,
University of Quebec-Montreal
Stephane Surprenant
,
University of Quebec-Montreal

Abstract

We move beyond Is Machine Learning Useful for Macroeconomic Forecasting? by adding the how. The current forecasting literature has focused on matching specific variables and horizons with a particularly successful algorithm. To the contrary, we study a wide range of horizons and variables and learn about the usefulness of the underlying features driv- ing ML gains over standard macroeconometric methods. We distinguish 4 so-called fea- tures (nonlinearities, regularization, cross-validation and alternative loss function) and study their behavior in both the data-rich and data-poor environments. To do so, we carefully design a series of experiments that easily allow to identify the “treatment” ef- fects of interest. The fixed-effects regression setup prompt us to use a novel visualization technique for forecasting results that conveys all the relevant information in a digestible format. We conclude that (i) more data and non-linearities are very useful for real vari- ables at long horizons, (ii) the standard factor model remains the best regularization, (iii) cross-validations are not all made equal (but K-fold is as good as BIC) and (iv) one should stick with the standard L2 loss.

Dynamically optimal treatment allocation using Reinforcement Learning

Karun Adusumilli
,
University of Pennsylvania
Friedrich Geiecke
,
London School of Economics
Claudio Schilter
,
London School of Economics

Abstract

Devising guidance on how to assign individuals to treatment is an important goal of empirical research. In practice individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of an year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal trade-offs, previous work on devising optimal policy rules in a static context is either not applicable, or is sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes ex-ante expected welfare in this dynamic context. We allow the class of policy rules to be restricted for computational, legal or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable and computationally efficient, with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule. To do this, we show that a Partial Differential Equation (PDE) characterizes the evolution of the value function under each policy. The data enables us to obtain a sample version of the PDE that provides estimates of these value functions. The estimated policy rule is the one with the maximal estimated value function. Using the theory of viscosity solutions to PDEs we show that the policy regret decays at a n^{-1/2} rate in most examples; this is the same rate as that obtained in the static case.
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General
  • C2 - Single Equation Models; Single Variables