This study aims to create an artificial intelligence (AI) based machine learning (ML) model capable of predicting a spirometric obstructive pattern using variables with the highest predictive power derived from an active case-finding program for COPD in primary care.
Material and methodsA total of 1190 smokers, aged 30–80 years old with no prior history of respiratory disease, underwent spirometry with bronchodilation. The sample was analyzed using AI tools. Based on an exploratory data analysis (EDA), independent variables (according to mutual information analysis) were trained using a gradient boosting algorithm (GBT) and validated through cross-validation.
ResultsWith an area under the curve close to unity, the model predicted a spirometric obstructive pattern using variables with the highest predictive power: FEV1_theoretical_pre values. Sensitivity: 93%. Positive predictive value: 94%. Specificity: 97%. Negative predictive value: 96%. Accuracy: 95%. Precision: 94%.
ConclusionAn ML model can predict the presence of an obstructive pattern in spirometry in a primary care smoking population with no prior diagnosis of respiratory disease using the FEV1_theoretical_pre values with an accuracy and precision exceeding 90%. Further studies including clinical data and strategies for integrating AI into clinical workflow are needed.
Este estudio tiene como objetivo crear un modelo de aprendizaje automático (ML) basado en inteligencia artificial (IA) capaz de predecir un patrón obstructivo espirométrico utilizando variables con el mayor poder predictivo derivado de un programa activo de búsqueda de casos de enfermedad pulmonar obstructiva crónica (EPOC) en Atención Primaria.
Materiales y métodosUn total de 1.190 fumadores, de entre 30 y 80 años, sin antecedentes de enfermedad respiratoria, fueron sometidos a espirometría con IA artificial. Sobre la base de un análisis de datos exploratorio (EDA), las variables independientes (según el análisis de información mutua) se entrenaron utilizando un algoritmo de gradiente de aumento (GBT) y se validaron mediante validación cruzada.
ResultadosCon un área bajo la curva cercana a la unidad, el modelo predijo un patrón obstructivo espirométrico utilizando los valores del FEV1 prebroncodilatador. Sensibilidad: 93%. Valor predictivo positivo: 94%. Especificidad: 97%. Valor predictivo negativo: 96%. Precisión: 95%. Precisión: 94%.
ConclusiónUn modelo ML puede predecir la presencia de un patrón obstructivo en la espirometría en una población fumadora de atención primaria sin diagnóstico previo de enfermedad respiratoria utilizando los valores FEV1 prebroncodilatadores con una exactitud y precisión superiores al 90%. Se necesitan más estudios que incluyan datos clínicos y estrategias para integrar la IA en el flujo de trabajo clínico.
The world's population is aging, leading to an increased prevalence of chronic disorders such as chronic obstructive pulmonary disease (COPD).1 Early diagnosis and the challenge posed by reducing under diagnosed rates (as well as its classification and treatment) are still pending tasks.2 Promoting primary prevention and/or providing basic spirometry equipment in primary healthcare centers does not seem to have been turning points in improving this situation. The fact that the disease can manifest at an early age3 and that interpreting spirometry results is not always straightforward4 could be variables of interest when planning diagnostic strategies among general practitioners (GPs).
In recent years, the application of artificial intelligence (AI) in the field of medicine has grown exponentially, utilizing various types of data and positively impacting the functional diagnostic accuracy of diseases like COPD5. Could clinical decision-making and the automation of healthcare processes find anchor points in machine learning (ML) and deep learning (DL)? The answer is yes, but the data used in the models must adhere to data protection laws,6 be always accessible, meet a clinical need with relevant outcomes, and undergo exploratory data analysis (EDA). Finally, the validated algorithm must be integrated into the clinical workflow and field management of healthcare centers.7
Numerous examples exist of ML integration in the diagnosis of various pulmonary disease.8–11
Our goal is to create an AI model based on ML capable of predicting the presence or absence of an obstructive pattern using variables with the highest predictive power derived from an active search program for COPD in primary care.
Material and methodsPatientsIn the period between May 2015 and May 2017, patients referred from six primary care centers in the Valencian Community, Spain, aged between 30 and 80 years with a year-package index equal to or greater than 10, with or without symptoms, were included. Prior diagnosis of respiratory diseases, absence of a signed informed consent, and/or receiving active systemic treatment were considered exclusion criteria.
Forty-four GPs participated in patient inclusion. When a potential candidate patient was identified in the primary care consultation and after the signing of their informed consent, the patient was referred to the spirometry consultation located at each of the health centers in the study area.
Spirometry assessmentAll patients underwent forced spirometry and post-bronchodilator test (BDT) in accordance with ATS/ERS guidelines,12 using the same USB Care Fusion® equipment and trained personnel. Only spirometric assessments quality criteria A and B were analyzed.
VariablesThe following variables were analyzed: age, gender, number of cigarettes smoked daily, number of years smoking, forced vital capacity (FVC), forced expiratory volume in 1 second (FEV1) in absolute and theoretical values, FEV1/FVC ratio both before and after BDT, as well as the lower limit of normality (LLN) of the ratio after BDT. The obstructive pattern was defined according to GOLD 2023 consensus criteria.13
Statistical analysisData was stored and analyzed using the Statistical Package for the Social Sciences (SPSS) version 21.0® (SPSS Inc, Chicago, IL, Estates Unites) (IBM Analytics, Arkoma, NY, EE. UU.).
The statistical analysis initially involved a general descriptive study of the results obtained in all included variables. Results were expressed as mean±standard deviation for continuous variables (or median and range if the distribution was not normal), and as absolute values and percentages for categorical variables.
EDA in Python version 3.8.5The data consisted of 1232 rows and 16 columns, including 15 numeric variables and two nominal categorical variables. The target variables were defined as the presence or absence and FEV1/FVC ratio less than 70%. Duplicated, missing, extreme, or atypical values within the dataset were removed.
Computerized algorithm and validationThe development of a computer algorithm for interpreting spirometry results using ML was based on Python version 3.8.5 and the use of mutual information statistics.14–17
The importance of variables was estimated using gradient tree boosting (GTB) of LightGBM,18,19 and a new decision tree based on spirometry data combined with age, gender, and smoking habits was developed. The area under the curve (AUC) was used to evaluate the models. To better assess the model's prediction, a cross-validation 5-fold20 was performed, where the data was divided into five equal parts, and five iterations were conducted, with each fold used as the validation set (20%) and the remaining as the training set (80%). This technique helps avoid overfitting that could occur with small datasets.
Out-of-fold predictions (off-preds)20 were used to measure the predictive capability of the model on the already validated data.
The ranking of variables based on their predictive power after the training and validation process was also confirmed using the Explain Like I’m 5 (ELI5) library.21
The performance of our classification model was evaluated using a 2×2 confusion matrix.
During the preparation of this work, the authors used exploratory data analysis (EDA) in Python 3.8.5, gradient boosting algorithm (GBT) of LightGBM, cross-validation 5-fold, out-of-fold predictions (off-preds) and Explain Like I’m 5 (ELI5) to design and validate the machine learning (ML) model. After using these tools, the authors reviewed and edited the content as needed and assume full responsibility for the publications’ content.
The study protocol was approved by the ethics committee of the Arnau de Vilanova-Lliria Hospital located in Valencia, Spain.
ResultsThe training dataset included 1190 cases after the completion of EDA. Table 1 provides a descriptive summary of the sample.
Descriptive analysis.
Patient features | |
---|---|
Features | Value |
Number of patients | 1190 |
Age | 55.86±10.72 |
Women | 522 (43.86%) |
Men | 668 (56.14%) |
Cigarettes per day | 20.81±11.71 |
Years of smoking | 31.27±12.11 |
Pre-quotient FEV1/FVC | 72.89±9.57 |
Spirometry post-quotient | 74 |
LLN | 66.02±3.31 |
FVC_absolute_pre | 3431.86±911.5 |
FVC_theoretical_pre | 83.16±15.37 |
FVC_absolute_post | 3444.98±903.04 |
FVC_theoretical_post | 83.5±15.24 |
FEV1_absolute_pre | 2511.03±785.81 |
FEV1_theoretical_pre | 81.62±19.22 |
FEV1_absolute_post | 2521.12±770.34 |
FEV1_theoretical_post | 81.89±18.97 |
Pre-quotient: pre-bronchodilator ratio of FEV1 and FVC; Spirometry post-quotient: ratio of FEV1 and FVC after bronchial dilation test; LLN: lower limit of normal; FVC_absolute_pre: pre-bronchodilator forced vital capacity absolute; FVC_theoretical_pre: pre-bronchodilator forced vital capacity theoretical; FVC_absolute_post: post-bronchodilator forced vital capacity absolute; FVC_theoretical_post: post-bronchodilator forced vital capacity theoretical; FEV1_absolute_pre: pre-bronchodilator forced expiratory volume in 1 second absolute; FEV1_theoretical_pre: pre-bronchodilator forced expiratory volume in 1 second theoretical; FEV1_absolute_post: post-bronchodilator forced expiratory volume in 1 second absolute; FEV1_theoretical_post: post-bronchodilator forced expiratory volume in 1 second theoretical.
Through mutual information,14–17 patterns of correlation (dependence) between variables were identified (see Fig. 1).
- -
Weak correlations: among spirometry results, tobacco, gender, and age.
- -
Intermediate correlations: among spirometry-derived results.
- -
Strong correlations: between age and LLN (0.638) and between the pre-BDT ratio and post-BDT ratio, both with the target variable (0.75 and 0.98 respectively).
Mutual information of the recorded variables. Degree of dependence between variables. Pre-quotient: pre-bronchodilator ratio of FEV1 and FVC; Spirometry post-quotient: ratio of FEV1 and FVC after bronchial dilation test; LLN: lower limit of normal; FVC_absolute_pre: pre-bronchodilator forced vital capacity absolute; FVC_theoretical_pre: pre-bronchodilator forced vital capacity theoretical; FVC_absolute_post: post-bronchodilator forced vital capacity absolute; FVC_theoretical_post: post-bronchodilator forced vital capacity theoretical; FEV1_absolute_pre: pre-bronchodilator forced expiratory volume in 1 second absolute; FEV1_theoretical_pre: pre-bronchodilator forced expiratory volume in 1 second theoretical; FEV1_absolute_post: post-bronchodilator forced expiratory volume in 1 second absolute; FEV1_theoretical_post: post-bronchodilator forced expiratory volume in 1 second theoretical. Obstructive pattern.
The positive correlation between pre- and post-BDT spirometry results, along with the more widespread use of forced spirometry without bronchodilator testing in primary care centers, influenced the use of pre-BDT results instead of post-BDT results, without significantly affecting the predictive power of the model based on the AUC of the different classifiers analyzed. Additionally, the other variables with weak and intermediate dependencies were used as input data in the chosen algorithm.
ML algorithm with multiple variable combinationsUsing a GTB of LightGBM,18,19 permuting variables with higher predictive power allowed the analysis of 11 different GTB models, providing an overview of their discriminative capabilities through the resulting AUC values. The standout models were model 2 (mod2), 3 (mod3), and 4 (mod4).
- •
Mod2: age, gender, pre-quotient (pre-bronchodilator ratio of FEV1 and FVC), FVC_theoretical_pre (pre-bronchodilator forced vital capacity theoretical), FEV1_theoretical_pre (pre-bronchodilator forced expiratory volume in 1 second theoretical).
- •
Mod3: gender, pre-quotient, FVC_theoretical_pre, FEV1_teórico_pre.
- •
Mod4: pre-quotient, FVC_theoretical_pre, FEV1_theoretical_pre.
It is worth noting that the AUC value exceeded 0.97 in Mod2, Mod3, and Mod4. Spirometry data alone statistically predict the presence or absence of an obstructive pattern.
Classifier model validationAfter using cross-validation 5-fold and off-preds,20 the most relevant off-preds were displayed (Figs. 2 and 3).
Out-of-fold prediction of the pre-quotient. Left: with five variables (age, gender, pre-quotient, FEV1_theoretical_pre and FVC_theoretical_pre). Right: with three variables (pre-quotient, FEV1_theoretical_pre and FVC_theoretical_pre). Pre-quotient: ratio of FEV1: forced expiratory volume in 1 second. FVC: forced vital capacity. Measured before bronchial dilation test. FEV1_theoretical_pre: pre-bronchodilator forced expiratory volume theoretical in 1 second. FVC_theoretical_pre: pre-bronchodilator forced vital capacity theoretical.
Out-of-fold prediction of the FEV1_pre. Left: with five variables (age, gender, pre-quotient, FEV1_theoretical_pre and FVC_theoretical_pre). Right: with three variables (pre-quotient, FEV1_theoretical_pre and FVC_theoretical_pre). Pre-quotient: ratio of FEV1: forced expiratory volume in 1 second. FVC: forced vital capacity. Measured before bronchial dilation test. FEV1_theoretical_pre: pre-bronchodilator forced expiratory volume theoretical in 1 second. FVC_theoretical_pre: pre-bronchodilator forced vital capacity theoretical.
Fig. 2 for the pre-quotient, with both five and three variables, confirmed that the model is accurate, but data dispersion is lower in the model using three features. The predictive probability of the pre-quotient exceeds 0.8 for values below 66%.
The curves in Fig. 3 follow a similar pattern to Fig. 2, with less data dispersion in the case of three variables. However, in the range of 58–82% of the theoretical value of FEV1_theoretical_pre, there was a loss of probabilistic power in the model.
Finally, ELI521 ranked pre-quotient and FEV1_theoretical_pre as the variables with the highest probabilistic power.
Confusion matrixThe results are shown in Fig. 4.
Confusion matrix. Python 3.8.5. A (up): model with three features (pre-quotient, FEV1_theoretical_pre and FVC_theoretical_pre). B (down): model with five features (age, gender, pre-quotient, FEV1_theoretical_pre and FVC_theoretical_pre). Pre-quotient: ratio of FEV1: forced expiratory volume in 1 second. FVC: forced vital capacity. Measured before bronchial dilation test. FEV1_theoretical_pre: pre-bronchodilator forced expiratory volume theoretical in 1 second. FVC_theoretical_pre: pre-bronchodilator forced vital capacity theoretical.
Model A: Sensitivity (S): 93%. Positive predictive value (PPV): 94%. Specificity (E): 97%. Negative predictive value (PNV): 96%. Precision (P): 94%. Accuracy (Ac): 95%.
Model B: S: 73%. PPV: 94%. E: 97%. PNV: 88%. P: 94%. Ac: 90%. The false negative rate increased using the 5-variable model.
DiscussionThe focus was placed on the real contribution of the study on conventional clinical practice.
With the aim of contributing to the diagnosis of COPD, this is the first study that combines AI with spirometry data derived from a case-finding study in primary care. Our study shows that a ML model can predict the presence of an obstructive pattern in spirometry in a primary care population with no prior diagnosis of respiratory disease using the FEV1_theoretical_pre values with an accuracy and precision exceeding 90%.
In the conventional practice of health centers in the Valencian Community, there is no access to spirometry consultations. As a result, in most cases, only those patients with a high symptom burden and frequent visits to the GPs are referred to tertiary care centers with the intention of conducting a complete respiratory functional study and accessing specialized pulmonology consultations. The shortage of resources, both human and material, in health centers hinders the diagnosis of mild cases, patients with few symptoms, young patients, and women. All these factors are considered determinants in the underdiagnosis of COPD.22,23
On another note, the use of simpler devices than spirometry, such as COPD-6 among others, allows obtaining FEV1 values at the time of the consultation quickly.24 The emphasis that our study places on FEV1_theoretical_pre values could contribute to larger-scale studies among health centers, with the intention of further refining the cutoff points of FEV1_theoretical_pre through a machine learning model capable of reducing overfitting bias. Likewise, it could contribute to future validations of the use of these microspirometers in primary care consultations.
EDAEDA is used as the first step in the data cleaning process.25 Thanks to this, the ML model uses homogeneous data with the same units and no outliers.26 In this sense, the data for each patient in our population was accessible and objective, and it underwent EDA to ensure the quality of our algorithm.
Classification algorithms: decision treesBased on our classification problem, sample size, and the need to handle dimensionality as well as interrelationships between variables, and based on previous literature,27 our model used decision trees despite the loss of accuracy of this type of algorithm when the disease prevalence is not high.28 This is a common point with other research groups focused on respiratory pathologies; however, the heterogeneity of the samples in terms of objectives, volume of variables, and analyses used made it difficult to make a comparative analysis between the results of our algorithm and previous groups.29
Additionally, the use of modified decision trees as GTB18,19 allowed individual training of each tree to correct errors made by previous trees, so that they were interconnected and built based on the residual sorted prediction errors of previous trees, gradually reducing the overall error. This allowed us to adapt the model to our positive rate.
Overfitting of models. Loss of precisionThe advantage of random forests over GTB18,19 is that the former tolerates overfitting better, meaning the loss of model accuracy when faced with new data. Therefore, in addition to using GTB, our team opted for cross-validation tools previously used.30,31 Additionally, the use of off-preds20 provided a more realistic measure of the model's performance on previously unseen data.
After this validation process, our model did not lose statistical or predictive power when using only functional variables (no change in AUC in the absence of information on gender, age, and LLN), or even when using only pre-bronchodilation data. This could be of interest because primary care physicians would use lung values derived from simple devices in their offices to identify patients suspected of having obstructive patterns, where the BDT would eventually be performed with varying speed. Unfortunately, the existing literature that combines case-finding in COPD and the use of AI is scarce. However, we agree with other groups on the relevance of FEV1_theoretical_pre values as a predictor of the presence or absence of obstructive patterns.28
The realityFor the understanding and using ML models in COPD (a disease with an underlying biological mechanism that is still unknown), the presence of more data (functional, genomic, and clinical) derived from prospective multicenter studies with continuous monitoring is vital.32
On the other hand, obtaining optimal metrics does not automatically guarantee a positive impact, so paired studies comparing AI with conventional practice and the integration of predictions into healthcare workflows are required. Ethical questions about the use of AI, such as assigning responsibility in the case of an incorrect diagnosis or misuse of the model, also need to be addressed. Multidisciplinary committees are necessary to ensure effective and safe implementation.6,22,33,34
LimitationsOur study has several limitations, both stemming from our methodology and from the evaluation of the utility of AI in conventional clinical care.
In the first case, out study used a limited number of patients; however, the AI tools we used allowed us to avoid model overfitting under these circumstances. Another limitation was the lack of social and clinical data; however, the initial goal of our study was to relay on rapidly accessible data in primary care consultations to ensure that the model could predict the presence or absence of an obstructive pattern.
In the second case, as integrating AI into healthcare workflows is an inevitable challenge, we must find integration pathways through tools that are already being used, such as apps among students and doctors. This will allow us to obtain more multicenter, functional, genomic, and social data. With these data, trials, paired studies, and real-time studies can be conducted to increase the reliability of AI as support for our work rather than as an adversary.
ConclusionBased on our results, a ML model with GBT is capable of predicting the presence of an obstructive pattern in spirometry, using the pre-bronchodilation FEV1 value as a predictor variable, in a population of primary care smokers without a prior diagnosis of respiratory disease. Further studies that include clinical and longitudinal data are needed, as well as strategies for integrating AI into healthcare workflows. It is our duty to harness the incredible resources of AI to benefit the millions of people who currently suffer from and will suffer from COPD.35 This is what we must continue to do despite the limited training in AI tools in medical schools. The time for multidisciplinary teams with data experts and healthcare professionals has arrived.
FundingThis study was carried out through a grant offered by Boehringer Ingelheim. The funding for the grant was managed by The Foundation for the Advancement of Health and Biomedical Research in the Valencian Community (FISABIO).
Authors’ contributionsMoreno MD is the main researcher and writer of the manuscript.
Ferrando, Rissi, Cepeda, Agostini MD and Catala MD PhD have reviewed the manuscript.
Marin has conducted the statistical analysis and code writing.
Data acquisition and writing the manuscript: RM, AM, JF, GR, SC, GA, PC.
Conflicts of interestThe authors declare no conflict of interest.