« Back to Results

ML-Enabled Econometrics with Unstructured Data

Paper Session

Sunday, Jan. 5, 2025 10:15 AM - 12:15 PM (PST)

Hilton San Francisco Union Square, Union Square 17 and 18

Hosted By: American Economic Association

Chair: Szymon Sacher, Stanford University

Debiasing Machine-Learning- or AI-Generated Regressors in Partial Linear Models

Jingwen Zhang

University of Washington

Wendao Xue

University of Washington

Yifan Yu

University of Texas-Austin

Yong Tan

University of Washington

View Abstract

Abstract

Researchers are increasingly leveraging machine learning (ML) or artificial intelligence technologies (AI) to predict feature variables and use them as regressors in subsequent econometric models. However, because ML/AI predictions are imperfect, these generated regressors would inevitably contain measurement errors. The direct use of such regressors in subsequent econometric models can result in biased estimation, ultimately leading to inaccurate conclusions. In light of this, we examine the problem of debiasing ML/AI-generated regressors in partial linear regression models. We propose estimators that utilize Two-Stage Least Square (TSLS) and Generalized Method of Moments (GMM) under the Double Machine Learning (DML) framework. We demonstrate the asymptotic consistency and normality of our estimators. Moreover, we conduct extensive Monte Carlo simulations and empirical applications to show the outperformance of our estimators compared with other methods. Our work advances causal inference in addressing measurement error problems arising from ML/AI-generated regressors in partial linear models and hence provides valuable practical implications for designing experimental systems and overcoming ML/AI biasedness.

Inference for Regression with Variables Generated by AI or Machine Learning

Laura Battaglia

Oxford University

Timothy Christensen

University College London

Stephen Hansen

University College London

Szymon Sacher

Stanford University

Abstract

It has become common practice for researchers to use AI-powered information retrieval algorithms or other machine learning methods to estimate variables of economic interest, then use these estimates as covariates in a regression model. We show both theoretically and empirically that naively treating AI- and ML-generated variables as "data" leads to biased estimates and invalid inference. We propose two methods to correct bias and perform valid inference: (i) an explicit bias correction with bias-corrected confidence intervals, and (ii) joint maximum likelihood estimation of the regression model and the variables of interest. Through several applications, we demonstrate that the common approach generates substantial bias, while both corrections perform well.

Demand Estimation with Text and Image Data

Giovanni Compiani

University of Chicago

Ilya Morozov

Northwestern University

Stephen Seiler

Imperial College London

View Abstract

Abstract

We propose a demand estimation method that allows researchers to estimate substitution patterns from unstructured image and text data. We first employ a series of machine learning models to measure product similarity from products' images and textual descriptions. We then estimate a nested logit model with product-pair specific nesting parameters that depend on the image and text similarities between products. Our framework does not require collecting product attributes for each category and can capture product similarity along dimensions that are hard to account for with observed attributes. We apply our method to a dataset describing the behavior of Amazon shoppers across several categories and show that incorporating texts and images in demand estimation helps us recover a flexible cross-price elasticity matrix.

JEL Classifications

C1 - Econometric and Statistical Methods and Methodology: General
C5 - Econometric Modeling

This website uses cookies.

ML-Enabled Econometrics with Unstructured Data

Sunday, Jan. 5, 2025 10:15 AM - 12:15 PM (PST)

Debiasing Machine-Learning- or AI-Generated Regressors in Partial Linear Models

Abstract

Inference for Regression with Variables Generated by AI or Machine Learning

Abstract

Demand Estimation with Text and Image Data

Abstract

JEL Classifications