« Back to Results

Economic Applications of Machine Learning

Paper Session

Friday, Jan. 5, 2018 8:00 AM - 10:00 AM

Marriott Philadelphia Downtown, Liberty Ballroom Salon A
Hosted By: American Economic Association
  • Chair: Daniel Björkegren, Brown University

A Large Scale Model of Travel Time and User Choice Behavior

Susan Athey
,
Stanford University
Robert Donnelly
,
Stanford University

Abstract

This paper uses a novel dataset (based on mobile phone location data) to analyze the impact of travel time on consumers’ choices about lunch restaurants. We estimate a large scale structural model of user choice behavior, where the consumer considers visiting physical locations within a reasonable driving distance, incorporating estimates of the travel time required to visit them. The model includes rich heterogeneity, including user-specific preferences for latent restaurant characteristics, as well as heterogeneity in sensitivity to travel time. We highlight heterogeneity in users' willingness to travel to different restaurants, and we compare the actual and counterfactual impact of restaurants that opened and closed in our data. We show that restaurants that closed may have had suboptimal fit between category and location, relative to restaurants that opened. We make use of computational techniques from the machine learning literature to estimate the model; we use a hierarchical Bayesian approach estimated using mean-field variational inference, and rely on stochastic gradient descent for optimization.

Behavior Revealed in Mobile Phone Usage Predicts Loan Repayment

Daniel Björkegren
,
Brown University
Darrell Grissen
,
Entrepreneurial Finance Lab

Abstract

Many households in developing countries lack formal financial histories, making it difficult for banks to allocate capital, and for potential borrowers to obtain loans. However, many unbanked households have mobile phones, and even prepaid phones generate rich data about their behavior. This project shows that behavioral signatures in mobile phone data predict default with accuracy approaching that of credit scoring methods that rely on financial histories. The method is demonstrated using call records matched to loan outcomes for a sample of borrowers in a Caribbean country. Individuals in the highest quartile of risk by our measure are 5.5 times more likely to default than those in the lowest quartile. We obtain this performance despite the fact that our sample is poor and uses phones infrequently. We outline several ways our method could be practically implemented.

Estimating Poverty and Wealth From Mobile Phone Data

Joshua Blumenstock
,
University of California-Berkeley
Gabriel Cadamuro
,
University of Washington
Robert On
,
University of California-Berkeley

Abstract

Accurate estimates of population demographics are a critical input to social and economic research. Here, we show that it is possible to predict the wealth of an individual based on the analysis of his past history of mobile phone calls, and that phone-based predictions of millions of citizens can be aggregated into accurate national statistics. The approach is first demonstrated on a sample of 856 phone survey respondents in Rwanda, and separately validated through 1,234 face-to-face interviews in Afghanistan. In resource-constrained environments where censuses and household surveys are rare, this creates an option for gathering timely information on population statistics at a tiny fraction of the cost of traditional methods.

Forecasting Economic Activity With Yelp Data

Edward Glaeser
,
Harvard University
Hyunjin Kim
,
Harvard Business School
Michael Luca
,
Harvard Business School

Abstract

Measuring and forecasting economic activity is a central component of policymaking and policy research. Statistics released by government agencies such as the Bureau of Labor Statistics and Census Bureau have been the backbone of much of this work, providing insight about a wide set of policy questions. While valuable, these sources have important limitations - they are are published at low frequency with large reporting lags that can stretch back two years, and lack consistent data at granular levels of analysis such as cities or neighborhoods. While more granular data can be made available to researchers, there is often an additional waiting period of one to two years to receive this access.

These factors impose practical limitations on the data’s ability to shed light on real-time trends and policy. Pairing user-generated data on local business activity from Yelp with government data sources including the Quarterly Census of Employment and Wages (QCEW) and housing price data, we examine the potential and limitations of using Yelp data to improve the measurement of real time economic activity, as well as economic forecasts. We investigate the ways in which Yelp data can provide a useful complement to QCEW, by making reliable predictions on local business patterns well before the release of official statistics. However, the ability to make meaningful predictions lies not only in data gathering but also in data cleaning and model selection – we explore these decisions as well. Lastly, we expand this analysis by forecasting other economic outcomes, such as housing prices at the local level.
Discussant(s)
Michael Luca
,
Harvard Business School
Marshall Burke
,
Stanford University
Greg Lewis
,
Microsoft Research
Shane Greenstein
,
Harvard Business School
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General
  • O3 - Innovation; Research and Development; Technological Change; Intellectual Property Rights