Advanced fibrosis is a crucial stage in the progression of autoimmune hepatitis (AIH), where fibrosis can either regress or advance. This study aims to leverage machine learning (ML) models for the assessment of advanced liver fibrosis in AIH patients using routine clinical features.
Patients and MethodsA total of 233 patients diagnosed with AIH and underwent liver biopsy were included in the discovery cohort. The dataset was randomly split into training and testing sets. Patients were categorized into groups with no/minimal/moderate fibrosis and advanced fibrosis. Six ML models were employed to identify the optimal model. Subsequently, the predictive capability of the best ML model was validated in an additional cohort (n = 33) and compared with conventional noninvasive fibrosis scores.
ResultsThree key clinical features, including prothrombin time (PT), albumin (ALB), and ultrasound spleen thickness (UTST), were analyzed by least absolute shrinkage and selection operator (LASSO) regression. In the training set, the random forest (RF) model showed the highest diagnostic performance in predicting advanced fibrosis stage (AUC=0.951). In the testing cohort and validation cohort, the RF model maintained high accuracy (AUC = 0.863 and AUC = 0.843). Additionally, the random forest model outperformed the conventional noninvasive fibrosis scores.
ConclusionsML models, particularly the RF model, can help improve the discrimination of advanced liver fibrosis in patients with AIH.
Autoimmune hepatitis (AIH) is characterized by inflammation of the liver parenchyma resulting from autoimmune reactions. It manifests with elevated serum transaminase, positive autoantibodies, hyperglobulinemia, and interface hepatitis in liver histology, predominantly affecting females [1]. The prevalence of AIH has gradually increased in recent years [2]. The International Autoimmune Hepatitis Group (IAIHG) scoring system, widely used for AIH diagnosis, saw simplified diagnostic criteria in 2008, achieving high sensitivity (90 %) and specificity (95 %) [3]. Approximately one-third of AIH patients progress to advanced fibrosis and cirrhosis at diagnosis, increasing the risk of hepatocellular carcinoma (HCC) [4]. Advanced fibrosis significantly influences AIH prognosis, as patients in this stage may either regress or progress to cirrhosis. Early identification and intervention at this stage are crucial for improving outcomes and the quality of life for AIH patients.
The diagnosis of AIH and the staging of fibrosis present significant challenges in clinical practice. Liver biopsy, the gold standard for evaluating liver fibrosis, faces limitations such as invasiveness, cost, associated risks (e.g., bleeding, infection), potential sampling errors, and variability in interpretation (both inter-observer and intra-observer), making it unsuitable for monitoring and long-term treatment response assessment [5–7]. As is well known, transient elastography (TE) is a special ultrasound scan that assesses liver stiffness as a surrogate marker for liver fibrosis [8,9]. However, the significance of TE in AIH patients is controversial, as the elevation of alanine aminotransferase (ALT) level and hepatic inflammation may affect the accuracy of TE in detecting liver fibrosis [10]. The high cost of TE devices and operator requirements limit its clinical use in resource-limited settings. Blood-based tests like fibrosis index based on the four factors (FIB-4), aspartate aminotransferase to alanine aminotransferase ratio (AAR), and aspartate aminotransferase to platelet ratio index (APRI) have shown limited capabilities in measuring advanced fibrosis in AIH [2,11,12]. Therefore, developing a noninvasive model to discriminate advanced liver fibrosis in AIH is crucial.
The advancement of electronic medical records and hospital information platforms has facilitated the easier acquisition of clinical data. In this era, machine learning (ML) models have demonstrated effectiveness in appraising diagnoses, implementing early warning systems, and predicting drug responses within the medical field [13–15]. Notably, supervised learning in ML has proven superior in predicting clinical outcomes compared to traditional statistical analyses [16]. In a groundbreaking approach, we propose the development of a noninvasive model using ML methods to predict the advanced fibrosis stage. This innovative model utilizes routine data readily available in clinical settings, marking the first instance of such an approach. By harnessing the power of ML, we aim to enhance the accuracy and efficiency of predicting advanced liver fibrosis, thereby providing a valuable tool for clinicians in their decision-making processes.
2Patients and methods2.1Patients’ selectionThis study included a total of 233 participants with AIH, aged 18 years or older, encompassing both type 1 and type 2 AIH, which were formed as discovery cohort. The participants were sourced from the First Affiliated Hospital of Army Medical University (Southwest Hospital) between January 2010 and April 2022. Additionally, nine AIH patients from the First Affiliated Hospital of Army Medical University (Southwest Hospital) between May 2022 and October 2023 and 24 AIH patients from the Second Affiliated Hospital of Army Medical University (Xinqiao Hospital) between January 2018 and October 2023 were recruited, which were formed as validation cohort.
All participants underwent a comprehensive assessment, including demographic data, medical history, clinical manifestations, and abdominal ultrasonography. Prior to receiving systematic therapy (including albumin infusion, liver-protecting drug, prednisone and azathioprine), all patients had undergone liver biopsy within a week of obtaining blood for laboratory tests. The liver biopsy confirmed clear pathological features of AIH, and fibrosis stages were also evaluated. The diagnosis of AIH was established when a simplified International Autoimmune Hepatitis Group (IAIHG) score reached at least 7.
Exclusion criteria were applied as follows: (1) Patients with hepatocellular carcinoma (HCC) or other malignancies; (2) Patients with concurrent liver diseases such as viral hepatitis, nonalcoholic fatty liver disease (NAFLD), alcohol-related liver disease, primary biliary cholangitis (PBC), primary sclerosing cholangitis (PSC), drug-induced liver disease (DILI), and other inherited or metabolic liver diseases; (3) Patients suffering from other serious autoimmune diseases, heart, respiratory, or hematological diseases; (4) Patients lacking a substantial amount of data.
Twenty-eight common clinical features were gathered for our study, including demographic information: age, sex, height, weight, body mass index (BMI); routine blood tests: red blood cells (RBC), white blood cells (WBC), platelet count (PLT), hemoglobin (HGB), red blood cell distribution width (RDW), mean platelet volume (MPV), platelet distribution width (PDW), hematocrit value (HCT), monocyte, lymphocyte, neutrophil; coagulation function: prothrombin time (PT), activated partial thromboplastin time (APTT); abdominal ultrasonography: ultrasonic spleen thickness (UTST); blood biochemical: serum globulin (GLB), serum albumin (ALB), total bilirubin (TBIL), total bile acid (TBA), alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (ALP), gamma-glutamyl transpeptidase (GGT).
FIB-4, AAR, and APRI were calculated as follows:
FIB-4 = (Age years × AST) / (PLT × √ALT)
AAR = AST / ALT
APRI = (AST / upper limit of normal AST) × 100 / PLT
Note: AST and ALT were measured in units per liter, and PLT was measured in 10 [9] per liter.
Liver biopsy (LB) procedures were carried out using a 16-gauge disposable needle. All liver specimens obtained were independently scored by pathologists who were blinded to the patients' information. The Schauer grading system was employed to classify liver fibrosis into the following stages [17]: S0, no fibrosis; S1, portal fibrosis without septa; S2, portal fibrosis with rare septa; S3, numerous septa without cirrhosis, and S4, cirrhosis. Based on these fibrosis stages, patients were categorized into two groups: S0-S2, defining no/minimal/moderate liver fibrosis, and S3-S4, defining advanced liver fibrosis.
The research adhered to the principles of the Declaration of Helsinki and received approval from the Ethics Committee Boards of Southwest Hospital and Xinqiao Hospital. Informed consent was waived for this study.
2.2Statistical analysesK-nearest neighbor classification was employed to impute missing values in the dataset. Continuous variables were presented as mean ± standard deviation (SD), while categorical variables were expressed as percentages. To compare normally distributed continuous variables, a two-sample independent t-test was utilized. For non-normally distributed continuous variables, the Mann-Whitney U test was applied. The comparison of categorical variables was conducted using the Chi-square test or Fisher test. A two-tailed p-value was used to indicate statistical significance. All statistical analyses were performed using IBM SPSS Statistics version 26.0.
2.3Machine learning modelThe Least Absolute Shrinkage and Selection Operator (LASSO) technique was employed to select variables, facilitating the creation of a simplified model by constructing a penalty function and mitigating overfitting. Independent predictive indicators identified through LASSO were then utilized to establish supervised machine learning (ML) models. SHapley Additive exPlanations (SHAP) were employed to explain the importance and role of each clinical feature included in the ML models.
Six ML methods were established, including random forest (RF), extreme gradient boosting (XGBoost), logistic regression (LR), multilayer perceptron (MLP), decision tree (DT), and support vector machine (SVM), to determine the most fitting ML model. Ten-fold cross-validation was applied to create more reliable parameter combinations for the ML models. The predictive accuracies of the six ML models were assessed using the area under the curve (AUC). The models were subsequently validated in both the testing cohort and validation cohort.
Finally, the performance of the best ML model was compared with conventional noninvasive models based on serum indicators, including FIB-4, AAR, and APRI. This comprehensive approach aimed to identify the most effective and accurate model for predicting advanced liver fibrosis in patients with AIH.
Calibration curve and decision curve analyses (DCA) were employed for the assessment of clinical application. The calibration curve assessed whether the prediction probabilities of the model were in conformity with clinical practice. On the other hand, DCA calculated the benefit under different threshold probabilities to infer the clinical applicability of the model. To achieve a comprehensive evaluation of the six ML models, sensitivity, specificity, accuracy, and recall were calculated. These metrics provide insights into the performance of the models in different aspects of prediction. Finally, the optimal ML model was transformed into an online calculator to facilitate its use in clinical practice. This step aims to enhance the practicality and accessibility of the model for healthcare professionals. The ML analyses were performed using R 4.1.2 software.
2.4Machine learning model training, testing, and validationThe discovery dataset was randomly partitioned into training and testing cohorts. In the training cohort, ten-fold cross-validation was implemented, where the data was divided into ten groups. Nine groups were utilized for training the ML model, and one group was reserved for cross-validation to ensure robustness. Diagnostic indicators, including accuracy, AUC, sensitivity, specificity, and recall, were assessed for the ML model in both the training and testing sets. The optimal ML model was then validated in an external cohort to assess its performance on new and unseen data. Additionally, a comparison was conducted between the optimal ML model and noninvasive fibrosis models, which included FIB-4, AAR, and APRI. This comprehensive evaluation aimed to ascertain the superiority of the ML model in predicting advanced liver fibrosis compared to conventional diagnostic approaches.
2.5Ethics statementAll subjects were waived the need for informed consent. In order to ensure confidentiality, the names of study participants were not included in the data. Information obtained from the data of the study participants is kept confidential. In addition, the Ethics Committee of the Southwest Hospital and Xinqiao Hospital Third Military Medical University (Army Medical University) approved the study.
2.6Patient consent statementInformed patient consent was waived with the consent of the ethics committee of the local hospital.
3Results3.1Clinical characteristics of all AIH patientsA total of 266 patients diagnosed with AIH were included in the study, with 233 patients forming the discovery cohort and 33 comprising the validation cohort (Fig. 1). Among all the participants, 236 were female, accounting for 88.7 % of the total. The gender frequency, Body mass index (BMI), most liver function indicators, and noninvasive fibrosis tests showed comparable distributions between the discovery and validation cohorts (Table 1).
Comparative clinical features of AIH patients in the discovery versus validation cohort.
Abbreviation: AIH, autoimmune hepatitis; BMI, body mass index; WBC, white blood cells; RBC, red blood cells; HGB, hemoglobin; PLT, platelet count; RDW: red blood cell distribution width; HCT, hematocrit value; MPV, mean platelet volume; PDW, platelet distribution width; ALT: alanine aminotransferase; AST: aspartate aminotransferase; ALP: alkaline phosphatase; GGT: gamma-glutamyl transpeptidase; TBIL, total bilirubin; TBA, total bile acid; ALB, albumin; PT, prothrombin time; APTT, activated partial thromboplastin time; UTST, Ultrasonic spleen thickness; FIB4, fibrosis index based on the four factors; AAR, aspartate aminotransferase to alanine aminotransferase ratio; APRI, aspartate aminotransferase to platelet ratio index.
In the discovery cohort, the mean BMI was 21.89 kg/m2, and the mean age was 49 years. The distribution of liver fibrosis stages was as follows: S0 in 54 patients (23.2 %), S1 in 76 patients (32.6 %), S2 in 44 patients (18.9 %), S3 in 31 patients (13.3 %), and S4 in 28 patients (12.0 %). Table 2 revealed statistically significant differences in most indicators and non-invasive fibrosis scores between the none-to-moderate fibrosis group (S0-S2) and the advanced fibrosis group (S3-S4).
Comparative clinical features of AIH patients in the discovery cohort.
Abbreviation: AIH, autoimmune hepatitis; BMI, body mass index; WBC, white blood cells; RBC, red blood cells; HGB, hemoglobin; PLT, platelet count; RDW: red blood cell distribution width; HCT, hematocrit value; MPV, mean platelet volume; PDW, platelet distribution width; ALT: alanine aminotransferase; AST: aspartate aminotransferase; ALP: alkaline phosphatase; GGT: gamma-glutamyl transpeptidase; TBIL, total bilirubin; TBA, total bile acid; ALB, albumin; PT, prothrombin time; APTT, activated partial thromboplastin time; UTST, Ultrasonic spleen thickness; FIB-4, fibrosis index based on the four factors; AAR, aspartate aminotransferase to alanine aminotransferase ratio; APRI, aspartate aminotransferase to platelet ratio index.
A total of 28 candidate features were considered for analysis. Among these, two features, MPV and PDW, had missing values. The k-nearest neighbor algorithm was employed to fill in the missing values for these features. The process of feature selection was carried out using LASSO regression, as depicted in Fig. 2. In accordance with clinical observations, it was found that decreased albumin (ALB), prolongation of prothrombin time (PT), and an increase in ultrasonic spleen thickness (UTST) were significantly associated with advanced fibrosis. These identified features are consistent with clinical expectations and align with established indicators of liver fibrosis in autoimmune hepatitis.
Least absolute shrinkage and selection operator (LASSO) regression for candidate biomarker selection.
The discovery cohort, consisting of 233 cases, was randomly divided into a training set (163 cases) and a testing set (70 cases). The training set was utilized for model construction, while the testing set served to validate the model. To ensure the robustness of the classifiers across the training and testing data, a ten-fold cross-validation method was adopted to calculate the diagnostic value for advanced fibrosis (≥ S3).
The cross-validation results revealed that the RF model exhibited superior performance with an AUC of 0.893, outperforming other models such as XGB with an AUC of 0.886, MLP with an AUC of 0.883, LR and SVM both with an AUC of 0.867, and DT with an AUC of 0.779. The cross-validation receiver operating characteristic (ROC) curves of the six machine learning models were illustrated in Figure S1.
3.4Machine learning model testing and validationIn the training set, models for RF, LR, MLP, SVM, DT, and XGB were created. The ML model demonstrating the highest diagnostic value was then compared with classical noninvasive scores. Table S1 summarizes the diagnostic indicators of the six ML models, including accuracy, AUC, sensitivity, specificity, and recall. Among the ML models, the RF model exhibited superior performance in both the training (Figure S2a) and testing sets (Fig. 3a), with AUCs of 0.951 and 0.869, respectively. Calibration curve and decision curve analyses further supported the clinical utility of the RF model (Figure S3). As expected, the RF model also demonstrated better efficiency than traditional noninvasive predictive models. In the testing set, the AUCs of FIB-4, APRI, and AAR were 0.775, 0.669, and 0.765, respectively, although the differences did not reach statistical significance (Fig. 3b-d). The accuracy of the RF model was further validated in an external validation cohort, where the AUC of the model was 0.843 (Fig. 4), indicating its considerable and stable diagnostic accuracy across different datasets.
The area under the receiver operating characteristic curve (AUC)s of predictive models for advanced fibrosis in the testing cohort.
The SHapley Additive exPlanations (SHAP) analysis illustrated the performance of each indicator that constituted the RF model in predicting advanced fibrosis. UTST and PT were positively associated with advanced fibrosis, whereas ALB was negatively associated with advanced fibrosis (Figure S4). To assess the practical application of the RF model, two randomly selected patients were evaluated to determine if the model could accurately distinguish between those with or without advanced fibrosis (Figure S5a-b). One patient, with ALB of 30.39 g/L, PT of 10.6 s, and UTST of 60 mm, was diagnosed as positive by the RF model. Another patient, with ALB of 43.22 g/L, PT of 12.5 s, and UTST of 43 mm, was diagnosed as negative. In both cases, the RF model correctly diagnosed the patients. To enhance the accessibility of the RF model in clinical practice, it was transformed into a web calculator, allowing doctors to easily obtain the probability of advanced fibrosis by inputting numerical values for ALB, PT, and UTST. The web calculator can be accessed at https://yingpeng.shinyapps.io/shiny/.
4DiscussionThis study represents a groundbreaking endeavor as the first to employ ML methods in evaluating advanced fibrosis in patients with AIH. By exploring various ML methods for predicting advanced fibrosis in AIH, the study has showcased promising performance compared to established noninvasive predictors such as FIB-4, AAR, and APRI. Notably, the RF model emerged as the most effective in predicting advanced fibrosis. The diagnostic robustness of the RF model was validated in an additional cohort, reinforcing its reliability. To enhance practical applicability for clinicians, the RF model was transformed into a user-friendly web calculator, offering convenient access for healthcare professionals. This innovative approach not only broadens our understanding of ML applications in AIH but also provides a potentially valuable tool for more accurate and efficient prediction of advanced liver fibrosis in AIH patients.
The RF model, incorporating UTST, ALB, and PT, exhibited the best performance in the prediction of advanced fibrosis. Originally designed to predict esophageal varices in cirrhosis, UTST has recently been recognized for its significant role in predicting liver fibrosis. Previous studies have supported the predictive value of spleen thickness for significant fibrosis, even in patients with persistently normal or slightly elevated levels of ALT [18]. Sheptulina et al. found UTST to be a significant prognosticator of advanced fibrosis in AIH, with a sensitivity of 72.7 % and a specificity of 80 %23 [19]. Advanced fibrosis and liver cirrhosis can result in metabolic and synthetic dysfunctions, potentially elevating bilirubin levels and decreasing coagulation factors and thrombopoietin. PT prolongation, dependent on coagulation factors synthesized by the liver, is positively related to liver function deterioration. Monitoring PT is often necessary in patients with liver dysfunction, providing a cost-effective and easily available test that generally reflects bleeding risk [20]. Boursier et al. established a model comprising age, gender, GGT, AST, PLT, and PT based on accessible clinical indicators, demonstrating preferable diagnostic ability for advanced fibrosis in chronic liver diseases [21]. In a recent study, ALB was identified as the only independent risk factor in predicting the severity of NAFLD [22]. Olteanu et al. also observed a correlation between decreased ALB levels and advanced fibrosis [23]. This underscores the significance of ALB as a potential biomarker for assessing liver disease severity and fibrosis progression in various liver conditions, including NAFLD and chronic liver diseases.
Indeed, the rapid advancement of ML has significantly broadened its applications in liver diseases, offering valuable insights and tools across various aspects of liver health. This includes predicting the severity of fibrosis in hepatitis B and C virus infections, as well as assessing complications and outcomes post-liver transplantation [24–26]. However, risk stratification of liver fibrosis in non-infectious liver diseases, particularly in cases of AIH, presents a formidable challenge [27]. The diagnostic process for AIH is intricate due to the absence of reliable biomarkers, necessitating alternative approaches for accurate assessment [28]. In this context, ML methods have emerged as valuable tools, although their application depends on factors like sample size and algorithm selection. The accuracy of ML methods is closely linked to the size of the dataset. Given this limitation, identifying the optimal ML model for a small sample size becomes particularly crucial. Recent studies have explored different ML methods to enhance liver disease diagnosis and create predictive models for liver fibrosis stratification [29,30]. Prior research has highlighted the efficacy of two ML methods, RF and SVM, across various parameter categories. Consequently, we employed six models, including RF and SVM, to estimate the severity of liver fibrosis in AIH. Our study demonstrated that RF emerged as the optimal algorithm for small sample data. RF's strength lies in combining predictions from multiple weak classifiers, yielding a more accurate and stable prediction. Furthermore, its utilization of random samples and features ensures resilience to data noise, even with minimal tuning parameters and a small sample size [31,32]. This emphasizes the critical importance of selecting an algorithm tailored to the dataset's characteristics, particularly when dealing with limited samples in complex conditions like AIH.
In prior studies, RDW-CV and RDW-SD have been utilized to characterize the distribution of erythrocyte width, rather than RDW alone. AIH patients, especially those with advanced fibrosis, exhibit an increased susceptibility to hemolytic anemia [33]. The elevation of RDW-CV and RDW-SD in AIH patients may be attributed to the inhibition of erythrocyte maturation by pro-inflammatory cytokines. Furthermore, liver dysfunction and secondary malnutrition in AIH patients can result in a deficiency of hematopoietic substances such as iron, allowing many immature red blood cells to enter the peripheral circulation. Additionally, prior study observed a close relationship between RDW and the levels of inflammatory cytokines, which were higher in AIH patients [34]. These findings suggest a complex interplay between liver function, erythrocyte indices, and inflammatory processes in AIH. In line with numerous prior studies, the current study also reports a significant association between a low platelet count (PLT) and advanced liver fibrosis in AIH patients [35,36].
Inevitably, there were limitations in this study. Firstly, it was a retrospective study, introducing potential selective bias. And there was a high dropout rate due to limited number of patients received liver biopsy. However, we have performed cross-validation to enhanced the analysis of model robustness. Future studies incorporated more patients may validate the reliability of our model. Secondly, the study did not differentiate between different subtypes of AIH, potentially introducing bias into the results. Thirdly, PT was consisted of our RF model instead of INR, which may limit its widely use across different centers. However, PT may directly reflect the coagulation dysfunction in patients with liver disease. Lastly, while TE is considered as a main noninvasive image method for staging fibrosis, it is not included in our study. The study population and environment in our research may not have had widespread access to TE, which is a significant reason for its exclusion from our analysis. Future iterations of our model may incorporate TE measurements to enhance its predictive performance, particularly in cases where there is discordance between clinical parameters and the suspected severity of fibrosis. Despite these limitations, our study constitutes a practical application of ML to address longstanding questions in the field, offering novel perspectives on the challenging issue of liver fibrosis stratification in AIH.
5ConclusionsThis study illustrates that ML methods, especially the RF model, have the potential to enhance the prediction of advanced fibrosis in AIH. The utilization of this model allows for individualized predictions of fibrosis in AIH patients while minimizing evaluation costs. It is advisable for risk stratification and the implementation of suitable preventive measures in high-risk AIH patients. Clinicians are encouraged to advocate for the adoption of this model, which holds promise for optimizing the noninvasive assessment of liver fibrosis on a larger scale.
Author contributionsQW: designed the study, analyzed and interpreted the data and drafted the manuscript; QX, WL and HW: collected the data, searched and selected the literature; YP and SH, analyzed the data; YP and XZ: revised the manuscript and supervised the study. All authors approved the submission.