« Back to Results

Big Data in Finance

Paper Session

Sunday, Jan. 9, 2022 12:15 PM - 2:15 PM (EST)

Hosted By: Econometric Society
  • Chair: Mao Ye, University of Illinois-Urbana-Champaign

Risk Factors That Matter: Textual Analysis of Risk Disclosures for the Cross-Section of Returns

Alejandro Lopez-Lira
University of Florida


I exploit unsupervised machine learning and natural language processing techniques to elicit the risk factors that firms themselves identify in their annual reports. I quantify the firms' exposure to each identified risk and construct factor mimicking portfolios that proxy for each undiversifiable source of risk. The portfolios are priced in the cross-section and contain information above and beyond the commonly used multi-factor representations. A model that uses only firm identified risk factors (FIRFs) performs at least as well as traditional factor models, despite not using any information from past prices or returns.

The Changing Economics of Knowledge Production

Simona Abis
Columbia University
Laura Veldkamp
Columbia University


Big data technologies change the way in which data and human labor combine to create
knowledge. Is this a modest technological advance or a data revolution? Using hiring and
wage data from the investment management sector, we estimate firms' data stocks and the
shape of their knowledge production functions. Knowing how much production functions
have changed informs us about the likely long-run changes in output, in factor shares,
and in the distribution of income, due to the new, big data technologies. Using data
from the investment management industry, our results suggest that the labor share of
income in knowledge work may fall from 29% to 21%. The change associated with big data
technologies is two-thirds of the magnitude of the change brought on by the industrial

Explainable Machine Learning Models of Consumer Credit Risk

Andrew Lo
Massachusetts Institute of Technology


In this paper, we create Machine Learning (ML) models for forecasting home equity credit risk for individuals using a real-world dataset, and demonstrate methods for explaining the outputs of ML models that can make them more accessible. We analyze the explainability of models for various stakeholders: loan companies, regulators, loan applicants, and data scientists, incorporating their different requirements with respect to explanations. Regulation mandates that loan companies must disclose up to four factors that adversely affected a rejected credit applicant. For loan companies, we generate the interpretable explanations for every prediction. For regulators, we perform a stress test case study for two extreme scenarios (first: no information about the applicant, second: a randomly sampled extreme sample) and explain the model's behavior in those scenarios. For loan applicants, we generate diverse counterfactuals that can guide them with the steps to reverse the model's classification. To generate counterfactuals that are of pragmatic utility, we supplement the counterfactual with domain knowledge. Finally, for data scientists, we generate 1-2 simple rules that can accurately explain 70-72% of the dataset. Our work can help accelerate the adoption of ML techniques in domains that would benefit from interpreting their predictions.

True Liquidity and Fundamental Prices: United States Tick Size Pilot

Rohit Allena
Emory University
Tarun Chordia
Emory University


We develop a big-data methodology to estimate true stock prices and liquidity, explicitly considering rounding due to the minimum tick size. We apply our method to evaluate the tick size pilot (TSP), which increased the tick size for randomly chosen stocks. While the TSP increases market-maker profits it does not improve liquidity. This is consistent with theoretical models but contrasts with existing empirical studies. Rounding-adjusted true liquidity, unlike the existing liquidity measures, captures the TSP-induced trading restrictions and the decreased inventory holdings of market-makers, validating our methodology and the accuracy of our measures. It is important to account for rounding.

Asaf Manela
Washington University in St. Louis
Avi Goldfarb
University of Toronto
Ansgar Walther
Imperial College London
Joel Hasbrouck
New York University
JEL Classifications
  • G0 - General
  • C0 - General