This study compared the accuracy of the Simplified Acute Physiology Score 3 with that of Acute Physiology and Chronic Health Evaluation II at predicting hospital mortality in patients from a transplant intensive care unit.
METHOD:A total of 501 patients were enrolled in the study (152 liver transplants, 271 kidney transplants, 54 lung transplants, 24 kidney-pancreas transplants) between May 2006 and January 2007. The Simplified Acute Physiology Score 3 was calculated using the global equation (customized for South America) and the Acute Physiology and Chronic Health Evaluation II score; the scores were calculated within 24 hours of admission. A receiver-operating characteristic curve was generated, and the area under the receiver-operating characteristic curve was calculated to identify the patients at the greatest risk of death according to Simplified Acute Physiology Score 3 and Acute Physiology and Chronic Health Evaluation II scores. The Hosmer-Lemeshow goodness-of-fit test was used for statistically significant results and indicated a difference in performance over deciles. The standardized mortality ratio was used to estimate the overall model performance.
RESULTS:The ability of both scores to predict hospital mortality was poor in the liver and renal transplant groups and average in the lung transplant group (area under the receiver-operating characteristic curve = 0.696 for Simplified Acute Physiology Score 3 and 0.670 for Acute Physiology and Chronic Health Evaluation II). The calibration of both scores was poor, even after customizing the Simplified Acute Physiology Score 3 score for South America.
CONCLUSIONS:The low predictive accuracy of the Simplified Acute Physiology Score 3 and Acute Physiology and Chronic Health Evaluation II scores does not warrant the use of these scores in critically ill transplant patients.
Scoring systems, which predict patient outcomes in intensive care units, have been studied for more than three decades. The performance of prognostic models encompasses two objective measurements: discrimination (the model’s ability to classify survivors and non-survivors) and calibration (agreement between the observed and expected numbers of survivors and non-survivors) (1-8).
Acute Physiology and Chronic Health Evaluation (APACHE) scores and Simplified Acute Physiology Score (SAPS) models were previously validated as severity scoring systems. However, several groups were not well represented in the databases that were used to validate these prognostic systems, including transplant patients.
The validation of APACHE II included only 47 kidney transplant patients, without any lung or liver transplant recipients (9). The validation of APACHE III, a further update, involved 17,440 patients but included only 40 liver transplant patients; lung and kidney transplant patients were absent (10). The validation of SAPS 3 included 172 transplant patients: 90 liver, 55 kidney, eight lung, five kidney-pancreas, three pancreas and 11 cardiac transplant patients (11,12). For the validation of APACHE IV, 224 kidney and 139 liver transplant patients were enrolled, but no lung transplant patients were included (13).
SAPS 3 evaluates the data obtained during the first hour of intensive care unit (ICU) admission, which is useful for assessing a patient’s condition upon arrival and eliminating the factors attributable to the level of the care unit. With these adjustments, there have been improvements in the prediction of early and late mortality, as well as increased score sensitivity.
We chose to validate the APACHE II scoring system instead of the APACHE IV because the former has been extensively used to assess severity of illness in transplant patients (14-16). However, to our knowledge, there have been no studies applying SAPS 3 specifically to transplant patients or comparing the predictive accuracy of the two scores in this patient group.
The objective of the present study was to compare the accuracy of SAPS 3 with that of APACHE II in predicting in-hospital mortality in patients from a transplant ICU.
MATERIALS AND METHODSStudy designThis was a prospective clinical cohort study conducted in an 11-bed transplant ICU in southern Brazil (Dom Vicente Scherer Hospital, Complexo Hospitalar da Santa Casa de Porto Alegre). All of the patients receiving cadaver donor organs and admitted to the ICU between May 2006 and January 2007 were enrolled.
PatientsPatients aged ≥18-years after their first transplant were included in the study during the early postoperative period (<24 hours). Retransplant and living donor transplant patients were excluded.
EthicsThis study was approved by the Research Ethics Committee of the institution and was conducted in accordance with the provisions of the Declaration of Helsinki. All of the participants or their family members signed informed consent forms. The data were collected and recorded by a single author.
AssessmentsThe first ICU admission for each patient was used to predict hospital mortality. The APACHE II score was calculated in the first 24 hours after ICU admission, and the SAPS 3 score was calculated within 1 hour of admission (11,12).
The adjusted probability of death, according to the diagnostic category of APACHE II, was also calculated (9).
The probability of in-hospital death for SAPS 3 was calculated according to the general equation and was customized for South America. Customization has been more accurate in predicting the mortality of oncological patients admitted to ICUs in Brazil (11,12,17,18).
Statistical analysisThe data were analyzed using the Statistical Package for the Social Sciences (SPSS), version 13.0 for Windows (SPSS Inc. Chicago, IL, USA). Receiver-operating characteristic (ROC) curves were generated for the SAPS 3 and APACHE II scores to assess the predictive accuracy of these models and to determine the in-hospital mortality for each type of transplant (19).
To assess discrimination (capacity to classify survivors and non-survivors), sensitivity and specificity were calculated for the different SAPS 3 and APACHE II scores by plotting the ROC curve and calculating the respective area under the curve.
The best discriminating value was determined by maximal sensitivity and specificity. The highest value that resulted from this product was used as the cutoff point, and 95% confidence intervals (95% CIs) were computed for true- and false-positive rates and for the correct classification rate of the outcomes.
The Hosmer-Lemeshow goodness-of-fit test was used to assess calibration (the agreement between the observed and expected numbers of survivors and non-survivors with regard to the probability of death) (20). In this analysis, p>0.05 was believed to indicate good test adjustment.
The standardized mortality ratio (SMR) was calculated by dividing the observed mortality rate by the predicted mortality rate. An SMR equal to 1.0 indicated that the number of observed deaths equaled that of the expected number of deaths; an SMR>1.0 indicated the occurrence of a greater number of deaths than expected. Calibration and discrimination were assessed (as previously described) for the original models.
RESULTSThe study included 501 postoperative transplant patients admitted to the ICU between May 2006 and January 2007. The mean patient age was 46±2 years old; extubation occurred within 24 hours of admission (institutional protocol); the median ICU stay was three days (25th, 50th, and 75th percentiles of two, three, and five days, respectively), and the median hospital stay was 21 days (25th, 50th, and 75th percentiles of 14, 21, and 35 days, respectively).
The sample included 152 (30%) liver transplant patients (116 male, 36 female), 271 (54.9%) kidney transplant patients (168 male, 103 female), 54 (10.7%) lung transplant patients (29 male, 25 female), and 24 (4.7%) kidney-pancreas transplant patients (15 male, 9 female).
The ranges of the APACHE II and SAPS 3 scores among survivors and non-survivors are shown in Table 1.
Survivors and non-survivors according to APACHE II and SAPS 3 scores: mean values (±SD) and hospital mortality obtained in the different transplant patient groups.
APACHE II Mean±SD Survivors (n) | APACHE II Mean±SD Non-survivors (n) | p-value | SAPS 3 Mean±SD Survivors (n) | SAPS 3 Mean±SD Non-survivors (n) | p-value | |
---|---|---|---|---|---|---|
Liver | 15.8±5.2 (n = 134) | 19.5±6.1 (n = 18) | p<0.07 | 41.2±8.3 (n = 134) | 47.3±14.9 (n = 18) | p<0.107 |
Kidney | 17.1±3.5 (n = 264) | 19.86±8.59 (n = 7) | p<0.42 | 21.8±6.20 (n = 264) | 21.71±7.71 (n = 7) | p<0.958 |
Lung | 16.02±3.70 (n = 41) | 21.77±5.77 (n = 13) | p<0.04 | 27.46±5.90 (n = 41) | 35.08±8.07 (n = 13) | p<0.01 |
Kidney-pancreas | 15.70±4.41 (n = 24) | 24±0.91 (n = 1) | p<0.79 | 25.09±7.26 (n = 24) | 22±1.51 (n = 1) | p<0.682 |
SD = standard deviation; APACHE II = Acute Physiology and Chronic Health Evaluation; SAPS 3 = Simplified Acute Physiology Score models
APACHE II significantly overestimated the mortality rates for all of the transplant patients. An analysis by type of transplant showed that the score underestimated the mortality rate in lung transplant patients, which was 52% greater than the predicted rate, with no statistical significance. In the other cases, the score overestimated mortality non-significantly, except for renal transplant patients (Table 2).
Relationship between observed and predicted mortality rates within each group according to the SAPS 3 global equation, South America customized equation and APACHE II.
Transplant Type | n | Observed Mortality n (%) | APACHE II Predicted Mortality n (%) | SMR (ESTIMATE) 95% CI | SAPS 3 Global Equation Predicted Mortality n (%) | SMR (ESTIMATE) 95% CI | SAPS 3 America Do Sul Equation n (%) | SMR (ESTIMATE) 95% CI |
---|---|---|---|---|---|---|---|---|
Liver | 152 | 18 (11.8) | 22 (14.69) | 0.81 | 16 (10.69) | 1.11 (0.66-1.75) | 16 (10.62) | 1.12 (0.66-1.76) |
Kidney | 271 | 7 (2.6) | 40 (14.77) | 0.17 | 2 (0.76) | 3.42 (1.37-7.04) | 3 (0.97) | 2.67 (1.07-5.50) |
Lung | 54 | 13 (24.1) | 9 (15.87) | 1.52 | 1 (2.61) | 9.22 (4.90-15.77) | 2 (2.90) | 8.31 (4.42-14.21) |
Kidney-Pancreas | 24 | 1 (4.2) | 3 (12.88) | 0.32 | 1 (1.63) | 2.56 (0.06-14.28) | 1 (1.71) | 2.44 (0.06-13.57) |
TOTAL | 501 | 39 (7.8) | 74 (14.77) | 0.53 | 20 (4.01) | 1,94 (1.38-2.64) | 21 (4.14) | 1.88 (1.34-2.56) |
The SAPS 3 global equation underestimated the mortality rate in all types of transplants and showed statistical significance in cases of renal and pulmonary transplants (Table 2). The same poor performance was observed after customizing for South America, underestimating mortality in all of the groups and underestimating it statistically significantly in the same groups, as with the SAPS 3 global equation (Table 2).
For score calibration, the Hosmer-Lemeshow (HL-H and HL-C) tests were applied to all transplant groups; the test showed performance differences and overestimated the deciles for both scores (Table 3).
The Hosmer-Lemeshow (HL-H and HL-C) tests applied to all transplant groups.
H-L C-statistics (DF = 8) | H-L H- statistics (DF = 6) | AUROC (95% CI) | ||||
---|---|---|---|---|---|---|
X2 | p-value | X2 | p-value | p-value | ||
SAPS3 | 155.57 | <0.001 | 59.41 | <0.001 | 0.696 (0.607-0.786) | 0.670 |
APACHE II | 43.72 | <0.001 | 44.37 | <0.001 | 0.670 (0.579-0.762) | 0.670 |
AUROC = area under the curve; DF = degree of freedom; HL-H and HL-C = Hosmer-Lemeshow tests X2 = chi-squared; APACHE II = Acute Physiology and Chronic Health Evaluation; SAPS 3 = Simplified Acute Physiology Score models.
For score discrimination, an ROC curve analysis was used (Figure 1). Mortality discrimination was poor in the scores for both the liver and renal transplant groups and was average for the lung transplant group. The kidney-pancreas sample was too small for analysis.
ROC curve for APACHE II and SAPS 3: a) all types of transplantations; b) liver transplantations; a) All types of transplantations. SAPS 3 AUROC 0.696 95%CI (0.607-0.786) APACHE II AUROC 0.670 95%CI (0.579-0.762) P = 0.670∗. b) Liver transplantations. SAPS 3 AUROC 0.612 95%CI (0.450-0.733).
ROC curve analysis indicated a cutoff point above which the discrimination to predict mortality increased significantly for each group. For the liver transplant patients, SAPS 3≥40 had sensitivity of 0.6 and specificity of 0.46 in predicting mortality, whereas APACHE II≥16.5 had sensitivity of 0.67 and specificity of 0.63. In the lung transplant group, SAPS 3≥31.5 had sensitivity of 0.69 and specificity of 0.73 in predicting mortality, whereas the sensitivity and specificity of APACHE II≥17.5 were 0.69 and 0.76, respectively.
DISCUSSIONIn the present study, no differences were observed between SAPS 3 and APACHE II scores regarding their ability to predict hospital mortality in the transplant groups analyzed, even after customizing for South America. A study by Silva Junior et al. showed accuracy in predicting mortality with the South America-customized SAPS 3 equation in two surgical ICUs in Brazil (21). Their results did not match those of this study, most likely because transplant patients present a different surgical profile from that of general ICU patients; therefore, the two patients cohorts should be studied separately.
In terms of hospital mortality, the discrimination of both scores was poor for the liver and kidney transplant groups and average for the lung transplant group. Both score calibrations were poor for all of the groups.
These results are consistent with the literature, in which some studies have previously reported that SAPS models and APACHE scores are likely to overestimate mortality in high-risk patients and underestimate mortality in low-risk patients (2,8).
Liver transplant groupThe mortality of our liver transplant patients was overestimated by APACHE II. Several studies that used the APACHE system in liver transplants have shown opposite results. Spanier et al. (23) used APACHE II to assess 102 patients with end-stage liver disease awaiting transplantation; the authors reported low accuracy of this score for predicting the risk of death. Bein et al. (24) and Sawyer et al. (25) reported a correlation between mortality and APACHE II scores, but predicted mortality was not calculated in either study (25).
Angus et al. (26) used APACHE II in a group of liver transplant patients and concluded that the score was a good predictor of long-term (1-year) mortality after liver transplantation. An attempt to calibrate APACHE II for postoperative liver transplant patients (25-28) resulted in the overestimation of mortality, although to a lesser degree when compared with the original model. Arabi et al. (27) used the recalibration of APACHE II proposed by Angus et al. (26), and the results were the same as those obtained in the study by the latter group.
Keegan et al. (30) evaluated the performance of APACHE III in liver transplant recipients and demonstrated poor performance.
Recently, Park et al. (29) applied APACHE II and APACHE III scores in urgent liver transplant patients with acute liver failure. There was weak mortality prediction in this patient group, with mean scores (survivors 22.50±5.89 and non-survivors 18.19±5.89) and areas under the ROC curve (0.713 and 0.737 with p<0.05) that were similar to those found in our study.
No studies have reported results using SAPS 3 scores in liver transplant patients.
Kidney transplant groupMortality was also overestimated by APACHE II in our kidney transplant patients. Past evidence has indicated that APACHE II was not adequate for evaluating mortality in this group (25,33).
No previous studies reporting the use of SAPS 3 in kidney transplant patients were available for a comparison with our findings.
We believe that some admission variables applied to SAPS 3 scores, such as creatinine, mechanic ventilation and the need for kidney repositioning, are not predictors of death (36); this belief may explain the overestimation of mortality in this group.
Lung transplant groupSlightly better performance was observed for APACHE II and SAPS 3 in lung transplant patients. However, these results cannot be compared with those of previous studies because there have been no reports in the literature regarding the application of APACHE II disease severity scores in this specific group. In validation studies focused on mortality scores, only SAPS 3 was used with this patient group (in small numbers) (11,12).
Some important advantages of this study included the short data collection period, the prevention of possible biases resulting from medical and technological improvements that occurred during the study period, and the fact that all of the transplant patients came from a single service that used standardized clinical and surgical methods and techniques.
The present results should be interpreted carefully. Our study had a number of limitations. The patients were cared for at a single center and thus might not be representative of other ICUs in other medical centers. While this fact reflects the referral practice of our institution, it might limit the generalization of our findings.
However, the validation of SAPS 3 in a group of patients at a single center might more accurately reflect the model performance, without the confounding influence of differing standards of care at different institutions.
In contrast, the low number of deaths limited the power of the calibration analysis.
We believe that the low accuracy of SAPS 3 and APACHE II in transplant patients stems from these systems underscoring physiologic aspects, such as creatinine level and liver function, which are usually (and by definition) poor in transplant patients before they receive new organs and can persist into the immediate postoperative period (28). However, because these variables tended to improve rapidly following transplantation, the scores overestimated mortality in the end.
Our results clearly demonstrate the need for a severity of illness classification system that specifically focuses on transplant patients, as well as on patient subgroups receiving different types of transplants.
Although this transplant sample was sufficiently large to assess illness severity scores, it was still too small for the design of a new predictive model. In addition, the results described here refer to one single ICU located in one specific country and region.
We recommend using different scores that consider each transplant’s peculiar characteristics that contribute to post-transplantation mortality, such as MELD score variables for liver transplant patients (27-30), which account for creatinine, prothrombin time (INR) and bilirubin. For example, SAPS3 does not consider prothrombin time, a liver function marker that is an important variable. Studies that used MELD to predict post-liver transplant survival have shown conflicting results (31-35). These results could have been affected by using only pre-operative variables, thereby neglecting perioperative variables, such as surgical technique, ischemic time, graft conditions, and the need for immunoglobulin.
The Lung Allocation Score (LAS) has demonstrated great accuracy in predicting short-term mortality increases, survival decreases and increases in complications in the first year after lung transplants (34-36). Although this score was not developed for lung transplant patients, using some of its variables could influence the results in this patient group.
Therefore, a combination of variables from SAPS 3 (post-operatory) and LAS (pre-operatory) and other perioperative factors might help to define a better score for use in this specific subgroup.
Finally, to the authors’ knowledge, this was the first study using SAPS 3 in transplant patients to demonstrate that this score did not predict hospital mortality in this population.
The present findings indicate that the accuracy of SAPS 3 is as low poor as that of APACHE II in predicting in-hospital mortality in postoperative transplant patients who are admitted to the ICU and overestimated patient mortality. The poor discrimination and calibration obtained with SAPS 3 and APACHE II do not warrant the use of these scoring systems in this patient group.
The lack of any similar studies precluded a comparison of our results with those of other researchers. Further studies will be necessary to validate our results.
No potential conflict of interest was reported.
Oliveira VM was responsible for the data acquisition, analyses and writing of the manuscript. Brauner JS was responsible for interpreting the data and writing the manuscript. Filho ER, Susin RG, Draghetti V, and Bolzan ST were responsible for data acquisition. Vieira SR was responsible for the data interpretation and writing the manuscript.