ML-Enabled Econometrics with Unstructured Data
Paper Session
Sunday, Jan. 5, 2025 10:15 AM - 12:15 PM (PST)
- Chair: Szymon Sacher, Stanford University
Inference for Regression with Variables Generated by AI or Machine Learning
Abstract
It has become common practice for researchers to use AI-powered information retrieval algorithms or other machine learning methods to estimate variables of economic interest, then use these estimates as covariates in a regression model. We show both theoretically and empirically that naively treating AI- and ML-generated variables as "data" leads to biased estimates and invalid inference. We propose two methods to correct bias and perform valid inference: (i) an explicit bias correction with bias-corrected confidence intervals, and (ii) joint maximum likelihood estimation of the regression model and the variables of interest. Through several applications, we demonstrate that the common approach generates substantial bias, while both corrections perform well.Demand Estimation with Text and Image Data
Abstract
We propose a demand estimation method that allows researchers to estimate substitution patterns from unstructured image and text data. We first employ a series of machine learning models to measure product similarity from products' images and textual descriptions. We then estimate a nested logit model with product-pair specific nesting parameters that depend on the image and text similarities between products. Our framework does not require collecting product attributes for each category and can capture product similarity along dimensions that are hard to account for with observed attributes. We apply our method to a dataset describing the behavior of Amazon shoppers across several categories and show that incorporating texts and images in demand estimation helps us recover a flexible cross-price elasticity matrix.JEL Classifications
- C1 - Econometric and Statistical Methods and Methodology: General
- C5 - Econometric Modeling