Measuring and Mitigating Racial Disparities in LLMs: Evidence from a Mortgage Underwriting Experiment
Abstract
We evaluate LLM responses to a mortgage underwriting task using real loan applicationdata. Experimentally manipulated race is signaled explicitly or through borrower
name/location proxies. Multiple generations of LLMs recommend more denials and
higher interest rates for Black applicants than otherwise-identical white applicants,
with larger disparities for riskier loans. Simple prompt engineering can cost-effectively
mitigate these patterns. Race-blind recommendations correlate strongly with real
lender decisions and predict delinquency, but LLMs incorporate racial signals when
available despite similar delinquency rates across groups. Our findings show potential
costs of adopting this new technology in financial settings and raise important questions
for regulators.