General bounds on the area under the receiver operating characteristic curve and other performance measures when only a single sensitivity and specificity point is known

Roger M. Stein

Save this article

Need to know

We prove a number of theorems that relate to the upper and lower bounds on the area under the ROC, when only a single (TP, TN) point is known.
We introduce the notion of well-behaved and pathological ROC curves, and demonstrate that when ROC curves are well-behaved, the bounds may be made tighter.
We provide extensions that enable the results to be used to calculate bounds on other measures of interest such as Cohen’s d and the binary correlation between actual and predicted values.
These results are useful both for facilitating other proofs and theorems of interest, and for making decisions about which of two or more classifiers should be used to minimize potential risk or maximize potential benefit.

Abstract

Receiver operating characteristic (ROC) curves are often used to quantify the performance of predictive models used in diagnosis, risk stratification and rating systems. The ROC area under the curve (AUC) summarizes the ROC in a single statistic, which also provides a probabilistic interpretation that is isomorphic to the Mann– Whitney–Wilcoxon test. In many settings, such as those involving diagnostic tests for diseases or antibodies, information about the ROC is not reported; instead the true positive .TP/ and true negative .TN/ rates are reported for a single threshold value. We demonstrate how to calculate the upper and lower bounds for the ROC AUC, given a single .TP; TN/ pair. We use simple geometric arguments only, and we present two examples of real-world applications from medicine and finance, involving Covid-19 diagnosis and credit card fraud detection, respectively. In addition, we introduce formally the notion of “pathological” ROC curves and “well-behaved” ROC curves. In the case of well-behaved ROC curves, the bounds on the AUC may be made tighter. In certain special cases involving pathological ROC curves that result from what we term “George Costanza” classifiers, we may transform predictions to obtain well-behaved ROC curves with higher AUC than the original decision process. Our results also enable the calculation of other quantities of interest, such as Cohen’s d or the Pearson correlation between a diagnostic outcome and an actual outcome. These results facilitate the direct comparison of reported performance when model or diagnostic performance is reported for only a single score threshold.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@risk.net

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@risk.net