Diagnostic performance studies: interpretation of ROC analysis and cut-offs

Arredondo Montero, Javier; Martín-Calvo, Nerea

doi:10.1016/j.cireng.2022.11.011

Información del artículo

Texto completo

Bibliografía

Descargar PDF

Estadísticas

Figuras (1)

Texto completo

Diagnosis is a fundamental part of the medical practice. Much scientific literature has been published about the implementation of new diagnostic tools, so it is essential for every clinician to know the methods used to evaluate the validity of these tools and determine their usefulness.

1
Diagnostic performance studies: Diagnostic performance studies evaluate the ability of a clinical tool (for example, a radiological test or biomarker) to diagnose a certain pathology. These studies include:
-
A group of patients affected by a pathology
-
A group of control patients who generally have the same symptoms as the patient group. For example, in the case of studies evaluating the performance of a specific test for the diagnosis of acute appendicitis, the control group usually consists of patients with acute abdominal pain in whom the diagnosis of acute appendicitis has been ruled out.
-
A diagnostic tool for evaluation: for example, a new serum biomarker
-
A “gold standard” or reference test, which is the best existing diagnostic test for that pathology. For example, in acute appendicitis, this would be the histopathological study of the surgical specimen of the appendix. The objective of this test in this type of studies is to be used as a reference against which to compare the new tool being evaluated.
2
Description of diagnostic performance: In the evaluation of a diagnostic test, important indicators should be described:
-
Sensitivity (Se): the probability that the test will correctly classify (as positive) an individual affected by the pathology (ill or sick)
-
Specificity (Sp): the probability that the test will correctly classify (as negative) a control subject (healthy)
-
Positive predictive value (PPV): the probability that a subject has the pathology in question (ill or sick) if the test result was positive
-
Negative predictive value (NPV): the probability that a subject does not have the pathology in question (healthy) if the test result was negative.
-
Receiver Operating Characteristic (ROC) analysis: This analysis establishes the capacity of diagnosis of a binary classification system as its discrimination treshold is varied.1,2 The result of a ROC analysis is expressed as an area under the curve (AUC), which reflects the discriminatory capacity of the diagnostic tool, meaning its ability to classify the healthy subject as healthy and the ill or sick as ill or sick. The AUC is usually expressed as a number and a confidence Interval and is usually accompanied by a graph (Figure 1).

Fig. 1.
Example of real ROC curves with different diagnostic performances. The x-axis represents 1-Specificity and the y-axis represents Sensitivity. The straight line that divides the graph into 2 equal halves represents the AUC value of 0.5. (A) 0.97, excellent discrimination. (B) 0.83, moderate discrimination. C: 0.65, poor discrimination. (D) 0.52, practically no discrimination (like a coin toss).
(0.2MB).
o
AUC = 1: perfect discrimination
o
AUC ≥ 0.9: excellent discrimination
o
0.7 ≤ AUC < 0.9: moderate discrimination
o
0.5 < AUC < 0.7: poor discrimination

An AUC of 0.5 is considered a non-useful test for diagnosis, since the probability of it correctly classifying patients is the same as a coin toss. An AUC value of less than 0.5 indicates that the diagnostic criterion should be inverted.

3
The importance of disease prevalence: Se and Sp are intrinsic characteristics of a diagnostic test, but the predictive values depend on the prevalence of the disease in the population in which this test is applied. The same diagnostic tool will have a greater PPV if it is applied to a population where the prevalence of the disease is high than if it is applied in a population where the prevalence is low. Contrarily, the NPV decreases if the prevalence of the disease increases. In diagnostic performance studies, we can find different PPV and NPV for the same diagnostic tool when applied to different populations, depending on the prevalence of the disease in each population.3
4
The cut-off point: when a diagnostic tool is presented as a continuous variable, the cut-off point refers to the value chosen to classify the subjects as healthy or ill or sick. The cut-off point with the best discriminatory capacity is closest to the upper left corner of the ROC. There are different methods for its calculation4,5:
-
Cut-off point with the greatest Youden Index (Sensitivity + Specificity − 1)
-
Cut-off point with the lowest value for the formula: (1−sensitivity)2 + (1−specificity)2

With the cut-off point, we can calculate the Se, Sp, and predictive values of the diagnostic tool in a table that compares the results of our test with those of the standard reference. The cut-off point with the best discrimination capacity is usually chosen, but it may also be interesting to evaluate other cut-off points, such as the most sensitive (in severe pathologies) or the most specific (in pathologies whose treatment entails important adverse effects).

Strengths and limitations: ROC analysis is able to evaluate the discriminatory capacity of a diagnostic tool and compare it with others in a simple and graphic manner. Its limitations include the need for a gold standard against which the new tool can be compared and and the difficulty to generalize the results (cut-off points, sensitivity and specificity) to other populations (Fig. 1).

Contributions of the authors

All the authors have made substantial contributions, both directly and intellectually, to the manuscript and have approved the final version for publication.

Original paper

All of the authors of this manuscript declare that this is an original contribution that has not been previously published.

Conflict of interest

The authors have no conflicts of interest to declare, nor any external sources of funding.

References

[1]

R. Aggarwal, P. Ranganathan.

Understanding diagnostic tests - Part 3: receiver operating characteristic curves.

Perspect Clin Res., 9 (2018), pp. 145-148

http://dx.doi.org/10.4103/picr.PICR_87_18 | Medline

[2]

K.H. Zou, A.J. O’Malley, L. Mauri.

Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models.

Circulation., 115 (2007), pp. 654-657

http://dx.doi.org/10.1161/CIRCULATIONAHA.105.594929

[3]

J. Arredondo Montero, C. Bardají Pascual, G. Antona, M. Bronte Anaut, N. López-Andrés, N. Martín-Calvo.

Diagnostic performance of calprotectin and APPY-1 test in pediatric acute appendicitis: a systematic review and a meta-analysis.

Eur J Trauma Emerg Surg., (2022),

http://dx.doi.org/10.1007/s00068-022-02000-2

[4]

W.J. Youden.

Index for rating diagnostic tests.

Cancer., 3 (1950), pp. 32-35

http://dx.doi.org/10.1186/1471-2407-3-32 | Medline

[5]

J. Arredondo Montero, G. Antona, A. Rivero Marcotegui, C. Bardají Pascual, M. Bronte Anaut, R. Ros Briones, et al.

Discriminatory capacity of serum interleukin-6 between complicated and uncomplicated acute appendicitis in children: a prospective validation study.

World J Pediatr, 18 (2022), pp. 810-817

http://dx.doi.org/10.1007/s12519-022-00598-2 | Medline

Indexada en:

Síguenos:

Suscribirse:

Indexada en:

Síguenos:

Suscribirse:

Suscríbase a la newsletter