OOD-Aware Fairness
Selective classification for toxicity detection with fairness constraints.
🎯The Problem
OOD signals can become inverted under domain shift, leading to unfair selective classification.
💡The Solution
OOD Detectability Metric (ODM) that estimates OOD reliability. Adaptive abstention policy. Fairness-constrained threshold selection (FPR gap ≤ ε AND abstention gap ≤ δ).
✨Key Highlights
- OOD score inversion diagnosis
- 86% reduction in FPR gap
- Balanced abstention across groups
📊Results & Impact
AURC: 0.048 (lowest)
FPR Gap: 0.04
Abstain Gap: 0.05