## Deep ROC Analysis

for binary classifiers & diagnostic tests

`pip install deeproc` | ** PyPI webpage** |

__Github webpage__ The area under the ROC curve (AUC) measures performance over the whole ROC curve, considering every possible decision threshold, which is too general and includes thresholds that would never be used.
Accuracy, F1 score, sensitivity and specificity measure performance at a single decision threshold (point) on an ROC curve, which is too specific and ignores information.
**Deep ROC analysis
(paper)
(presentation)
(code)
[1]**
permits in-depth analysis of classifier
performance in groups of predicted risk or
probability that span the ROC curve.
Previous attempts to represent AUC by

__partial AUC, the standardized partial AUC__or the

__two-way AUC__were flawed.

With deep ROC analysis we can validate that a classifier performs well in the group(s) that is/are most relevant--e.g., patients at greatest risk, or patients at medium risk that are challenging to classify, or the range of plausible decision thresholds. We may select a classifier differently based on group measures compared to whole area or single point measures.

### More information

Measuring classifier or test performance in a group, i.e., a range of thresholds, can account for the fact that each patient has different costs and risks--and that there are different priorities in different clinical settings (family practice, emergency, disease clinics). In contrast, measures at a single threshold are only optimal for a prototype or average patient. We have the opportunity to select and use classifiers better.

Also, in general, a classifier or test performs differently for patients at different levels of predicted risk.

Our group measures use familiar concepts such as AUC, sensitivity, specificity, positive predictive value (PPV) that includes population prevalence [2] and negative predictive value (NPV). The first three are the normalized versions of **
our concordant partial AUC
(AUCn_{i} or cpAUCn_{i}) [3]**, the partial AUC (avg Sens

_{i})

__[4]__, and horizontal partial AUC (avg Spec

_{i}) [3].

We also provide a **new interpretation of AUC and AUCn _{i} as balanced average accuracy [1]** for individuals instead of pairs of individuals. It is a weighted average that balances average sensitivity and average specificity [3] according to their proportional contribution in the range of interest. That is, the vertical range and horizontal range for part of an ROC curve may differ, thus contributing different amounts to AUCn

_{i}. Our interpretation explains how AUC and AUCn

_{i}exactly measure errors in decision-making, i.e., false positives and false negatives.

### References

[1] Carrington AM, Manuel DG, Fieguth PW, Ramsay T, Osmani V, Wernly B, Bennett C, Hawken S, McInnes M, Magwood O, Sheikh Y, Holzinger A. Deep ROC Analysis and AUC as Balanced Average Accuracy Deep ROC analysis and AUC as balanced average accuracy for improved classifier selection, audit and explanation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Early Access, January 25, 2022. __doi:10.1109/TPAMI.2022.3145392__

[2] Altman DG, Bland JM. Diagnostic tests 2: predictive values. BMJ, 309 (July 1994), 16104.

[3] Carrington AM, Fieguth PW, Qazi H, Holzinger A, Chen HH, Mayr F and Manuel DG. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms, BMC Medical Informatics and Decision Making 20, 4 (2020) __doi:10.1186/s12911-019-1014-6__

[4] McClish DK. Analyzing a Portion of the ROC Curve. Medical decision making, pp. 190–195, 1989.