The 4AT scale is a sensitive tool for screening delirium, which can be applied rapidly in clinical settings without any specific training. It has not been translated, adapted, and validated to assess Spanish older adults. The aims of the study are: to translate and adapt to Spanish culture the 4AT scale, to present evidence of the diagnostic accuracy of this version (4AT-ES) when applied in non-specialized hospital wards, and to assess the loss of diagnostic accuracy in presence of risk factors.
MethodsA prospective sample was independently assessed on the 4AT-ES and the reference standard. One hundred and twenty-one inpatients (70+ years) for whom a psychiatric assessment was requested were included. Out of them, 50 were diagnosed with delirium. Nurses without specific training applied the 4AT-ES, and experienced psychiatrists cast the reference standard diagnosis (DSM-V criteria).
ResultsPatients with delirium were older and had more risk factors (more previous delirium episodes, a higher likelihood of prior dementia/cognitive impairment) than controls. The 4AT-ES had excellent validity, sensitivity (96%) , and specificity (83.1%). The area under the curve was 0.918; in the subsample with any of those risk factors, its value did not decrease.
ConclusionThe 4AT-ES version of the 4AT scale was developed. When applied by non-specifically trained, nursing staff it showed excellent validity, sensitivity, and specificity, even in a subsample with previous risk factors. All indices were comparable to the original version. We recommend its use for efficient delirium screening in hospitalized older patients with suspected delirium.
Delirium is a neuropsychiatric disorder that presents with an acute onset and is fluctuating in nature. It is characterized by an overall cognitive impairment, being attention and awareness the most affected domains.1,2 Delirium is highly prevalent amongst older inpatients, especially within healthcare units treating severely ill patients (e.g., intensive care units, major surgery) or those treating frail individuals (e.g., palliative care units).3 Up to 50% of patients admitted within this age group suffer from delirium with different degrees of severity.4 The diagnosis is a clinical one and is based on the information gathered from the clinical assessment and from the collateral information obtained from relatives, and from health and care workers.2,5
Delirium is linked with poor clinical outcomes —during admission, the risk of developing complications such as infections, falls, incontinence, or pressure ulcers is higher, which has a significant impact on admission length of stay and on its associated costs.4,5 Upon discharge, it increases the risk of functional and cognitive impairment, readmission and institutionalization, and ultimately, death.6 An under-diagnosis or a late diagnosis further worsens the prognosis.1,5 Therefore, delirium screening is essential to be considered as an earlier intervention. Having adequate screening tools, validated for their use by non-specialized health professionals on the wards, is thus necessary for effective detection.
The 4AT scale is a rapid and sensitive tool that requires no specific training7 and is freely available for download and use (www.the4at.com). The original English version has been validated since 2014 for the screening of delirium in older inpatients at different levels of the healthcare system.8 This scale is the screening instrument recommended in the latest clinical practice guidelines about delirium.9 It is recommended even in the presence of cognitive impairment when the differential diagnosis is rather difficult.10 Since its publication, the performance of the 4AT scale has been evaluated in several languages, showing that its use in routine clinical practice is well-supported by an excellent diagnosis accuracy.11 Although there is a Spanish version,12 the translation process carried out by the authors is unknown, and there is no validation study that we are aware of.
The aim of this paper is threefold: Firstly, to introduce a Spanish version of the 4AT scale, translated from the original version and adapted to the Spanish culture. Secondly, to show evidence of its validity and diagnostic accuracy for screening delirium in the older hospitalized population when applied by nursing staff. And lastly, to assess whether its diagnostic accuracy differs within the subpopulation of patients with cognitive impairment.
Material and methodsTranslation and cultural adaptation to SpanishA translation and cultural adaptation of the original 4AT instrument in English was carried out, following the guidelines for cross-cultural adaptation of self-report measurement instruments.13 In the first two phases, initial translation and synthesis, two bilingual (English/Spanish) translators, being Spanish their native language, participated independently. One of them had previous medical training, while the other did not. The third stage, inverse translation, was carried out by two bilingual translators without formal medical training, blind to the original English version. The fourth stage was carried out by an expert committee consisting of two experts in research methodology, one healthcare professional, one linguist, and the four translators that had participated in the previous stages. Finally, registered nurses, training to obtain a nursing specialty degree in geriatric or psychiatric nursing, proof-tested its application with 10 older inpatients. None of them reported any difficulties with the application.
Sample selection and data collectionA prospective, convenience sample of seventy-year-old or older inpatients admitted to a university hospital, for whom psychiatric assessment was requested, were considered for inclusion in the study. Patients that lacked verbal communication had a severe hearing impairment, were non-Spanish speakers, or refused to participate were excluded. The delirium diagnosis was based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V) criteria and determined by an experienced psychiatrist, member of the Psychiatric Liaison team. Recruited patients who did not meet the criteria for delirium according to DSM-V were considered controls. Starting in July 2017, each patient that met the criteria (or a legal guardian) was asked to participate and sign the informed consent. A sample size of at least fifty participants in each relevant subgroup is recommended for health-related questionnaires.14 Thus, participants were recruited until a sample size of fifty participants was reached in both groups.
MeasuresThe 4AT scale consists of four items, addressing four specific domains suggestive of delirium.7 Item AT1 (level of alertness) scores alertness through direct observation, for a score of 0 or 4. The next two items, AT2 (AMT4 test) and AT3 (months backward) allow the screening of attention and cognition, respectively, through simple standardized questions; each of them is scored up to 2. Item AT4 (change or fluctuating course) requires collateral information to assess the presence of fluctuations or a sudden change in mental state and is scored dichotomously as 0 or 4.
Each patient was assessed using the translated and adapted version by a registered nurse without specific training or experience in psychiatry/psychogeriatrics, and no prior training for the 4AT scale. In order to gather convergent evidence of the validity of the measures, patients were also assessed on the scale by the same psychiatrist who cast the DSM-V-based diagnosis, and in the same session. The time interval between this session and the assessment by the nurse was 24 h at most (in either order, at the convenience of the evaluators). The nurses were blind to the DSM-V-based diagnosis, and both evaluators were reciprocally blind to the 4AT assessment of one another. A total of five nurses and eleven psychiatrists participated in the assessments.
Other variables taken into account, drawn from the clinical records of each patient, were: gender, age (in years), time since admission (in days), number of previous delirium episodes, whether there was a previous diagnosis of dementia/cognitive impairment, and the medical ward where the patient was admitted. When necessary, the patient, legal guardian, or healthcare team were interviewed for the missing information. The following derived binary variables were also computed: whether they had had any previous delirium episode at all (delirium antecedents), and a variable of cognitive vulnerability, collapsing those cases that had a previous diagnosis of dementia or cognitive impairment with the ones that had at least one previous delirium episode. This latter variable separated the participants that were likely to suffer from some degree of cognitive impairment from those that were not suspected at all of having any prior cognitive impairment.
Statistical analysesThe following analyses were all performed using R version 4.0.4.15 Certain packages, indicated where appropriate, were used for some specific computations. Descriptive statistics were obtained for all the variables. Differences between the Delirium and the Control group were assessed using the t-test or the χ2-test (applying the correction for continuity) for the continuous and the categorical variables, respectively.
The 4AT score was computed from the assessment given by the nurse personnel (NUR score). The four 4AT items were assumed to be effect indicators of a single unidimensional latent variable standing for the Delirium construct. The internal evidence of validity was assessed by testing this hypothesis of unidimensionality with parallel analysis.16 Minimum rank parallel analysis with polychoric correlations failed to converge, so principal axes parallel analysis with Pearson correlations were used instead.17 Reliability was assessed by computing the Cronbach's18 α and McDonald's19 Ω internal consistency indices. Parallel analysis and Cronbach's α were computed with package psych v. 2.0.12, 20 and McDonald's Ω with packages lavaan version 0.6-721 and semTools version 0.5-4.22
The score given by the psychiatrist (PSY score) was used as evidence of convergent validity. Their internal validity and reliability were also assessed independently of the NUR measures, following the same procedure. Then, the NUR and PSY composite scores were correlated. This correlation was corrected for attenuation using the Ω coefficients of both measures.
The NUR diagnostic accuracy was assessed by computing and plotting the ROC curve. The area under the curve (AUC) statistic was computed using R package pROC version 1.17.0.1.23 Its 95%-coverage confidence interval was computed using DeLong's method with 2000 stratified replicates.24 The Youden index25 was also obtained for the total score, and the pre-specified cut-off point was compared to the optimal one (according to the Youden index), in order to assess its discriminant power. A predicted diagnostic was given based on the pre-specified cut-off point of 4 in the total score.8 Each individual item was also assessed by computing its AUC and the diagnostic accuracy statistics, using the non-null response categories as cut-off points. Sensitivity and specificity indices were computed for each of the cut-off points. Their 95%-coverage confidence intervals were computed using Wilson's score method,26 which gives the closest estimation to nominal levels independently of sample size.27 Positive and negative diagnostic likelihood ratios were also computed. The diagnostic prediction was computed as well for the PSY score, and the confusion matrix between both predictions was computed in order to assess their convergence. These analyses were also performed separately on the groups with and without suspected cognitive impairment (i.e., segmenting by cognitive vulnerability).
ResultsThe translation and adaptation process resulted in the Spanish version of the 4AT scale (4AT-ES). This version is translated to Spanish and adapted to the cultural context of Spanish residents. The instrument is shown in Table 1.
Twelve patients, who met at least one exclusion criteria, were excluded from the study. The sample was finally made up of a total of 121 participants aged 70 or older. Out of them, 50 were in the delirium and 71 in the control group. No data missingness was present in any of the variables of this sample. Descriptive statistics are shown in Table 2.
Descriptive statistics of the sample.
The groups differed significantly in all the quantitative variables (α = 0.05), except for the time since admission (t100.72 = -1.37, p-value = 0.175). The mean was always higher in the delirium group in all the variables that were significantly different. They did not differ in the proportion of males and females, or the ward they were recruited from. In the remaining categorical variables, the proportion of patients with previous cognitive impairment or dementia (χ22 = 18.57, p-value < 0.001), with any previous episodes of delirium (χ22 = 7.78, p-value = 0.020), or with cognitive vulnerability (χ22 = 9.77, p-value = 0.008) was higher in the delirium group.
Fig. 1 shows the results of the parallel analyses with the responses given by the NUR and the PSY scorers. As the figures show, the 4AT scale was found to be unidimensional with both scores, and thus the measure was congeneric in both cases. Both measures were found to have excellent internal consistency: The NUR score had an α of 0.731 and an Ω of 0.809, while the values for the NUR score were 0.831 and 0.831, respectively. Regarding convergent validity, the composite scores were found to be highly correlated; the Pearson correlation was 0.779 (0.697–0.841) between them. When corrected for attenuation by the Ω coefficient, its value ascended to 0.950. The confusion matrix is given in Table 3 (columns under the header Diagnosis (PSY score)) further shows that the scores given by the two groups of evaluators are highly coincident; 88.43% of the cases were concordant, although the NUR scores tended to be more sensitive and less specific. The ROC curve of the NUR score is represented in Fig. 2. As the curve shows, the scale had a very high diagnostic accuracy, its estimated AUC statistic being 0.918. The optimal Youden index was 0.791, corresponding to the score of 4, which corresponds to the cut-off point set by the original scale design.
Table 3 gives the confusion matrix of the NUR score-based diagnosis, under the header Diagnosis (Reference standard). The indices related to the scale diagnostic accuracy, according to the cut-off point of 4, are given in Table 4, along with the diagnostic accuracy indices of the item response categories for the NUR scores. Both the total score and item indices are given for the complete sample, as well as for the groups with and without cognitive vulnerability.
Diagnostic accuracy of the 4AT-ES scale scored by the NUR personnel and its items, for the whole sample and for the subgroups with/without cognitive vulnerability.
Abbreviations: LR +, positive likelihood ratio; LR -, negative likelihood ratio; AUC, area under the curve.
Our Spanish version of the 4AT scale (4AT-ES) showed evidence of high validity for screening delirium in hospitalized older people. Allegedly, strictly following the guidelines for translation and cross-cultural adaptation13 led to a rigorous version of the instrument in Spanish, suitable for validation in clinical settings in Spain. It should be noted though, when interpreting the results, that the prevalence of delirium in our sample was higher than in the validation of the original version.8 The sample consisted of patients that were referred to the Psychiatry Liaison team, being delirium one of the most frequent conditions in the clinical practice of such teams,28 thus the higher prevalence. Nevertheless, the sample was comparable to those used in the validation of the original sample8 and other validation procedures,29,30 the group with delirium having significantly more recognized predisposing factors: older age, previous cognitive impairment, and/or episodes of delirium.2
The scores obtained by the nursing staff and the psychiatrists were both unidimensional, as we would expect given the definition of the construct definition of delirium. The reliability of both scores was excellent, as evidenced by the estimated internal consistency indices. Furthermore, the high correlation between both measures yielded very solid evidence of the convergent validity between the two rater groups. Finally, the diagnostic accuracy of the nursing staff scores was also excellent: its estimated AUC was 0.918, very similar to the original version (0.927),8 and the cut-off point given by the Youden Index (0.791) confirmed the value expected from the design of the original version.
The sensitivity of the Spanish version was slightly higher than that of the original scale (96% vs. 89.7%),8 while the specificity was slightly lower (83.1% vs. 84,1%). We find the same pattern when comparing with the sensitivity (88%) and specificity (88%) pooled across studies with different adaptations of the scale in various settings.11 The corresponding false-positive rate (16.9%) was higher than the omission rate (4%) but acceptable -given the higher cost of omitting a diagnosis of delirium, this imbalance is desirable in a screening tool.31 Nevertheless, under a positive diagnosis, having delirium was more than five times as likely as being a false positive, as given by the positive likelihood ratio. On the other hand, the negative likelihood ratio, close to 0, indicated that it was highly unlikely that a patient with delirium would go undiagnosed. Contrary to previous results,32 we did not find worse diagnostic accuracy when the scale was applied by nurses without specific training.
Items AT2 and AT3, which measure cognition and attention respectively, contributed the most to sensitivity. On the other hand, items AT1 (level of awareness) and AT4 (change or fluctuating course) were the most specific. Consequently, the cognitive aspect does not contribute as much to the specific screening of delirium. This is consistent with the literature about screening delirium in a hospital setting when the level of consciousness is altered and/or information of a significant change or a fluctuating clinical course is available.33
In the cognitive vulnerability subgroup, the AUC was similar (0.872) to the cognitive impairment subgroup in the original version (AUC = 0.891),8 and kept a good diagnostic accuracy. As in the original scale, sensitivity was higher in this subgroup, while specificity was higher in the subgroup without cognitive vulnerability. It should be taken into account though that the cognitive vulnerability definition included all cases with a history of cognitive impairment or dementia, as well as those who had previously presented an episode of delirium but without specifying the diagnosis of dementia. This differs considerably from Bellelli et al,8 who only considered participants with dementia in their cognitive impairment subgroup. Regarding the item properties, the most specific items in the whole sample (AT1 and AT4) had a lower specificity in the cognitive vulnerability group. Nevertheless, validity and diagnostic accuracy were similar to the original version; these conclusions are valid even in people with cognitive vulnerability and considering the diagnostic properties of each item.
In summary, the 4AT-ES has excellent diagnostic accuracy for screening delirium in hospitalized older patients. These properties do not depend on whether the evaluator is a psychiatrist or a nurse, both without specific training for its application. This allows its application in a non-specialized unit to assist in delirium screening. Moreover, the procedure is simple, giving an accurate screening efficiently, and non-invasive, with minimal disturbance for the patient and no adverse effects. We hope that the 4AT-ES scale will facilitate the implementation of recommendations for the diagnosis and management of delirium in the clinical practice guidelines in Spain.
LimitationsOne of the main weaknesses of this study is the absence of an instrument to use as an external criterion to test convergent validity. As a result, we used the scores given by the psychiatrist as a criterion. However, these measures were given by the same specialist who cast the diagnosis, so they were not independent of the reference standard. This may make convergence evidence of validity redundant with diagnostic validation to some extent. However, since the original version of the scale was applied by expert physicians,8 using an equivalent criterion in the Spanish version allowed us to ensure validity under similar conditions.
There are other limitations that are worth mentioning. The characteristics of the patients excluded from the sample could not be analyzed, as data was not available for them. Also, as each participant was assessed only once by a single member of the nurse team, we could neither obtain an index of interrater reliability for this measure nor assess its test-retest reliability. Nevertheless, it is worth noting that other authors8,29 have not assessed test-retest reliability either —when a construct is characterized precisely by its fluctuating course, as happens with delirium, test-retest reliability may be hazardous to assess.34 The sample size was smaller than in the original study, since the estimation was carried out according to the recommendations for health-related questionnaires proposed by Terwee et al.14 In any case, it has allowed diagnostic accuracy, even in smaller sample sizes subgroups, those with and without cognitive vulnerability. Finally, the number and timing of administrations by each nursing professional who applied the scale was not collected in detail, nor was the potntial impact of a learning curve considered. This information did not satisfy any of the objectives of the study but could be relevant in order to analyze the error variance components in the future, applying Generalizability Theory.