Relative Scoring for Full-Spectrum Out-of-Distribution Detection
Authors
Meshram, Joyesh Vinod ; Gerginov, Mihail Grigorov ; Stavrev, Aleksandar Nikolaev
Term
4. term
Education
Publication year
2026
Submitted on
2026-06-12
Abstract
This thesis studies full-spectrum out-of-distribution (OOD) detection, where a post-hoc detector must accept covariate-shifted in-distribution (cs-ID) inputs while rejecting semantically novel ones (near- and far-OOD). We ask what changes when moving from the standard protocol (which accepts only clean ID) to full-spectrum evaluation, and whether relative scoring—judging a sample against its predicted class or a background reference rather than on an absolute scale—improves the trade-off at the cs-ID versus near-OOD boundary. Building on OpenOOD, we evaluate 20 post-hoc methods spanning output-, distance-, activation-, and gradient-based families on CIFAR-100, ImageNet-200, and ImageNet-1K under the standard protocol and, for ImageNet, the full-spectrum protocol, using a caching pipeline for controlled comparisons. A four-part diagnosis shows that all families lose performance under full-spectrum evaluation and that the loss concentrates at the cs-ID/near-OOD boundary: near-OOD is harder than far-OOD, thresholds tuned on clean ID admit too much near-OOD, many methods score cs-ID as more OOD than near-OOD (the reverse of what is required), and simple class conditioning does not fix it; a score-overlap analysis reveals a fundamental trade-off between stability under covariate shift and separation of near-OOD. Guided by this, we propose relative scoring designs: the Relative Temperature Sensitivity Score (RTSS), a logit-space Mahalanobis family with background-relative variants (RLSM, RLSM++), and Relative k-Nearest Neighbor (RKNN++), which compare a sample both to its predicted class and to the training distribution. Across ImageNet full-spectrum settings, these methods improve near-OOD AUROC: RTSS by about four points with little change under the standard protocol, RLSM by about nine points over LSM++, and RKNN++ by about ten points by partly correcting a failure where absolute distances reject valid cs-ID before near-OOD, at some cost under the standard protocol. On ImageNet-1K full-spectrum, RTSS attains the highest near-OOD AUROC among our methods (second only to SCALE), and RKNN++ achieves the highest overall performance among our methods. The gains are targeted rather than complete: strong activation- and gradient-based baselines remain better through far-OOD, and cs-ID and near-OOD score distributions still overlap heavily. We conclude that failures concentrate at a specific boundary that can be diagnosed with simple score and feature statistics, and that relative scoring is a promising design direction while the collapse between cs-ID and near-OOD remains an open problem.
Dette speciale undersøger fuldspektret out-of-distribution (OOD) detektion, hvor en efterfølgende (post-hoc) detektor skal acceptere kovariatforskudt in-distribution (cs-ID) samtidig med at den afviser nye, semantisk fremmede inputs (nær-OOD og fjern-OOD). Vi spørger, hvad der ændrer sig for post-hoc detektorer, når man går fra den standardiserede protokol (som kun accepterer ren ID) til fuldspektret evaluering, og om relative scoringer, der vurderer en prøve i forhold til dens forudsagte klasse eller baggrundsfordelingen, kan forbedre kompromiset ved grænsen mellem cs-ID og nær-OOD. Med OpenOOD som grundlag evaluerer vi 20 post-hoc metoder fra output-, afstands-, aktiverings- og gradientbaserede familier på CIFAR-100, ImageNet-200 og ImageNet-1K under standardprotokollen og, for ImageNet, fuldspektret protokol, med en cachet pipeline for kontrollerede sammenligninger. En fireleddet diagnose viser, at alle metoder taber ydeevne i fuldspektret evaluering, og at tabet koncentreres ved grænsen mellem cs-ID og nær-OOD: nær-OOD bliver sværere end fjern-OOD, tærskler valgt på ren ID lukker for meget nær-OOD ind, mange metoder scorer cs-ID som mere OOD end nær-OOD (omvendt af det ønskede), og simpel klassekonditionering løser det ikke; overlap mellem scorefordelinger afslører et grundlæggende trade-off mellem stabilitet under kovariat skift og separering af nær-OOD. Motiveret heraf udvikler vi relative scoringer: Relative Temperature Sensitivity Score (RTSS), en logit-rums Mahalanobis-familie med baggrundsrelative varianter (RLSM, RLSM++), og Relative k-Nearest Neighbor (RKNN++), som sammenligner en prøve både med dens forudsagte klasse og træningsdata som helhed. På tværs af ImageNet-opsætninger med fuldspektret evaluering forbedrer disse metoder AUROC for nær-OOD: RTSS cirka fire point med næsten uændret standardydelse, RLSM cirka ni point over LSM++, og RKNN++ cirka ti point ved delvist at rette en fejl, hvor absolutte afstande afviser gyldig cs-ID før nær-OOD, dog med noget tab under standardprotokollen. På ImageNet-1K fuldspektret opnår RTSS den højeste nær-OOD AUROC blandt de foreslåede metoder (kun overgået af SCALE), og RKNN++ den højeste samlede ydeevne blandt vores metoder. Forbedringerne er målrettede snarere end fuldstændige: stærke aktiverings- og gradientbaserede baselines er stadig bedre på fjern-OOD, og scorefordelingerne for cs-ID og nær-OOD overlapper fortsat markant. Vi konkluderer, at fuldspektret OOD-svigt primært ligger ved en specifik grænse, som kan diagnosticeres med enkle score- og featurestatistikker, og at relative scoringer er en lovende designretning, mens kollapset mellem cs-ID og nær-OOD forbliver et åbent problem.
[This apstract has been generated with the help of AI directly from the project full text]
