« Back to Results

Auditing and Regulating AI Systems

Paper Session

Friday, Jan. 7, 2022 10:00 AM - 12:00 PM (EST)

Hosted By: American Economic Association
  • Chair: Adair Morse, University of California-Berkeley

Unpacking the Black Box: Regulating Algorithmic Lending

Laura Blattner
,
Stanford University
Scott Nelson
,
University of Chicago
Jann Spiess
,
Stanford University

Abstract

We characterize optimal financial regulation in a world where lending decisions are made by complex credit scoring models but regulators are limited in the amount of information they can learn about these models. We show that limiting lenders to algorithms that are simple enough to be fully transparent is inefficient as long as the bias induced by misalignment between social preferences and lenders' preferences is small relative to the uncertainty about the true state of the world. Ex-post algorithmic audits can improve welfare, but the gains depend on the design of the audit tools. Tools that focus on minimizing overall information loss, the focus of most commercial `explainer' software tools, will generally be inefficient since they focus on explaining the average behavior of the model rather than sources of mis-prediction, which matter for welfare-relevant outcomes. Targeted tools that focus on the source of incentive misalignment, e.g., risk preferences or racial disparities, can provide first-best solutions. We provide empirical support for our theoretical findings using a large-scale credit bureau data set.

Characterizing Fairness Over the Set of Good Models Under Selective Labels

Amanda Coston
,
Carnegie Mellon University
Ashesh Rambachan
,
Harvard University
Alexandra Chouldechova
,
Carnegie Mellon University

Abstract

Algorithmic risk assessments are increasingly used to make and inform decisions in a wide variety of high-stakes settings. In practice, there is often a multitude of predictive models that deliver similar overall performance, an empirical phenomenon commonly known as the “Rashomon Effect.” While many competing models may perform similarly overall, they may have different properties over various subgroups, and therefore have drastically different predictive fairness properties. In this paper, we develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance, or “the set of good models.” We provide tractable algorithms to compute the range of attainable group-level predictive disparities and the disparity minimizing model over the set of good models. We extend our framework to address the empirically relevant challenge of selectively labeled data in the setting where the selection decision and outcome are unconfounded given the observed data features. We illustrate our methods in two empirical applications. In a real world credit-scoring task, we build a model with lower predictive disparities than the benchmark model, and demonstrate the benefits of properly accounting for the selective labels problem. In a recidivism risk prediction task, we audit an existing risk score, and find that it generates larger predictive disparities than any model in the set of good models.

Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices

Manish Raghavan
,
Cornell University
Solon Barocas
,
Microsoft Research
Jon Kleinberg
,
Cornell University
Karen Levy
,
Cornell University

Abstract

There has been rapidly growing interest in the use of algorithms in hiring, especially as a means to address or mitigate bias. Yet, to date, little is known about how these methods are used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and analyze the claims and practices of companies offering algorithms for employment assessment. In particular, we identify vendors of algorithmic pre-employment assessments (i.e., algorithms to screen candidates), document what they have disclosed about their development and validation procedures, and evaluate their practices, focusing particularly on efforts to detect and mitigate bias. Our analysis considers both technical and legal perspectives. Technically, we consider the various choices vendors make regarding data collection and prediction targets, and explore the risks and trade-offs that these choices pose. We also discuss how algorithmic de-biasing techniques interface with, and create challenges for, antidiscrimination law.

Value Alignment, Multi-Tasking, and Surrogate Outcomes

Rediet Abebe
,
Harvard University
Maximilian Kasy
,
University of Oxford

Abstract

An important concern about the social impact of AI is the question of value alignment. Algorithms maximizing a mis-specified objective might hurt welfare. Algorithms maximizing ad-clicks might, for example, generate filter bubbles, political polarization, and click bait; algorithms maximizing the matching success of job seekers might systematically place workers in jobs for which they are over-qualified; etc.
In this paper we study value alignment from the perspective of observability of welfare. Our analysis draws on connections to the literature on multi-tasking in contract theory, and the literature on surrogate outcomes in biostatistics. We develop tools to diagnose when value alignment causes problems, and to provide performance guarantees in the presence of mis-specified rewards. We provide, in particular, regret bounds for Markov Decision Problems (as in Reinforcement Learning), comparing optimal reward designs and surrogate-based rewards, as well as general reward functions.
Discussant(s)
Paul Goldsmith-Pinkham
,
Yale University
Peter Hull
,
University of Chicago
Talia Gillis
,
Columbia University
Vitor Hadad
,
Stanford University
JEL Classifications
  • G0 - General
  • C1 - Econometric and Statistical Methods and Methodology: General