Journal of Risk Model Validation

General bounds on the area under the receiver operating characteristic curve and other performance measures when only a single sensitivity and specificity point is known

Roger M. Stein

  • We prove a number of theorems that relate to the upper and lower bounds on the area under the ROC, when only a single (TP, TN) point is known.
  • We introduce the notion of well-behaved and pathological ROC curves, and demonstrate that when ROC curves are well-behaved, the bounds may be made tighter.
  • We provide extensions that enable the results to be used to calculate bounds on other measures of interest such as Cohen’s d and the binary correlation between actual and predicted values.
  • These results are useful both for facilitating other proofs and theorems of interest, and for making decisions about which of two or more classifiers should be used to minimize potential risk or maximize potential benefit.

Receiver operating characteristic (ROC) curves are often used to quantify the performance of predictive models used in diagnosis, risk stratification and rating systems. The ROC area under the curve (AUC) summarizes the ROC in a single statistic, which also provides a probabilistic interpretation that is isomorphic to the Mann– Whitney–Wilcoxon test. In many settings, such as those involving diagnostic tests for diseases or antibodies, information about the ROC is not reported; instead the true positive .TP/ and true negative .TN/ rates are reported for a single threshold value. We demonstrate how to calculate the upper and lower bounds for the ROC AUC, given a single .TP; TN/ pair. We use simple geometric arguments only, and we present two examples of real-world applications from medicine and finance, involving Covid-19 diagnosis and credit card fraud detection, respectively. In addition, we introduce formally the notion of “pathological” ROC curves and “well-behaved” ROC curves. In the case of well-behaved ROC curves, the bounds on the AUC may be made tighter. In certain special cases involving pathological ROC curves that result from what we term “George Costanza” classifiers, we may transform predictions to obtain well-behaved ROC curves with higher AUC than the original decision process. Our results also enable the calculation of other quantities of interest, such as Cohen’s d or the Pearson correlation between a diagnostic outcome and an actual outcome. These results facilitate the direct comparison of reported performance when model or diagnostic performance is reported for only a single score threshold.

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to View our subscription options

If you already have an account, please sign in here.

You need to sign in to use this feature. If you don’t have a account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here: