The ThyPRO questionnaire is the most widely used tool for measuring quality of life in patients with benign thyroid diseases. The purpose of this study was to adapt and validate a Spanish translation of the ThyPRO and its abbreviated version (ThyPRO-39).
Material and methodsAdaptation to the Spanish language was performed using the forward–backward translation method, followed by a pretesting study on five representative patients. The final questionnaire (ThyPROes) was administered to 155 patients with thyroid disorders recruited in a tertiary Spanish hospital. Psychometric properties were evaluated by multitrait scaling and estimation of internal consistency reliability (Cronbach's alpha coefficient). Data from a previous sample of 902 Danish patients were used to analyze differential item functioning (DIF) between the Spanish and the original Danish versions of the questionnaire using ordinal logistic regression.
ResultsThree of 85 items in ThyPROes and four of the 39 items in ThyPRO-39es lacked convergent validity, while lack of discriminant validity was found for in nine and 14 items of each version respectively. Cronbach's alpha was >0.7 for 12 of 13 scales in the ThyPRO and 10 of 12 scales in the ThyPRO-39es. Eight items in the ThyPROes were flagged with DIF (one with non-uniform DIF), as were two items in the ThyPRO-39es. DIF magnitude was small (explained variance in the item score <3%) in most cases, with a minor impact on scale scores.
ConclusionsThe Spanish versions of the ThyPRO and ThyPRO-39 show acceptable psychometric properties and good cross-lingual validity, and are suitable for use in clinical studies.
El cuestionario Thyroid-Related Quality-of-Life Patient-Reported Outcome (ThyPRO) es el instrumento más utilizado para medir la calidad de vida en pacientes con enfermedades tiroideas benignas. Este estudio tiene como objetivo adaptar y validar una traducción al español del ThyPRO y su versión abreviada (ThyPRO-39).
Material y métodosLa adaptación al español se realizó utilizando el método de traducción-retrotraducción, seguido de una prueba preliminar en 5 pacientes representativos. El cuestionario definitivo ThyPROes se administró a 155 pacientes con trastornos tiroideos en un hospital terciario en España. Las propiedades psicométricas se evaluaron mediante la matriz multirrasgo-multimétodo y la estimación de la fiabilidad de la consistencia interna (alfa de Cronbach). Se utilizaron datos previos de 902 pacientes daneses para analizar el funcionamiento diferencial de los ítems (FDI) entre la versión original danesa del cuestionario y la española, mediante regresión logística ordinal.
ResultadosTres de 85 ítems del ThyPROes y 4 de 39 del ThyPRO-39es carecían de validez convergente, mientras que 9 y 14, respectivamente, carecían de validez discriminante. El alfa de Cronbach fue >0,7 para 12 de 13 escalas del ThyPROes y 10 de 12 del ThyPRO-39es. Ocho ítems del ThyPROes mostraron FDI (uno con FDI no uniforme) y 2 lo hicieron en el ThyPro-39es. La magnitud del FDI fue pequeña (varianza explicada en la puntuación del ítem <3%) en la mayoría de casos, con un impacto menor en las puntuaciones de las escalas.
ConclusionesLas versiones españolas del ThyPRO y ThyPRO-39 muestran aceptables propiedades psicométricas y buena validez interlingüística, y son adecuadas para su uso en estudios clínicos.
Health-related quality of life (HRQoL) can be defined as the global impact that diseases and their treatment exert on all relevant dimensions of the patient's life.1 The measurement of HRQoL is increasingly used to determine the outcome of health interventions. Since it is a subjective concept that includes physical, mental and social aspects of the well-being of the individual, it can only be assessed by the patients themselves. For its measurement it is necessary to resort to the use of standardized questionnaires. In general, there are two types of questionnaires: generic questionnaires and disease specific questionnaires. The generic questionnaires are applicable to all types of patients and populations, including the general population, while the specific ones are focused on the measurement of certain aspects of a particular disease, and are more appropriate to detect differences between alternative treatments.
As highlighted in recent years, different reasons justify the use of HRQoL measurement in patients with thyroid diseases.2 First, they are common and chronic disorders that rarely compromise the patient's life, so that treatments are mainly aimed at optimizing their quality of life. In addition, some of them can be treated by different procedures (i.e. drugs, radioiodine, surgery), often with no options having demonstrated absolute superiority over the others. Finally, health professionals frequently face patients who claim deterioration of their HRQoL, even after receiving apparently effective treatment.
In the last decade, different specific questionnaires have been designed to evaluate HRQoL in patients with thyroid diseases. Among them, questionnaires directed to patients with hyperthyroidism,3 thyroid orbitopathy4 or primary hypothyroidism.5–7 However, most of them have not been subjected to strict validation procedures and, in some cases, they were not used after the studies carried out for their elaboration.
More recently, Watt et al.8–11 developed the questionnaire “Thyroid-specific quality of life patient-reported outcome” (ThyPRO), designed to measure the quality of life related to all the main benign thyroid diseases. Unlike what happened in other cases, the authors of ThyPRO have published a rigorous process of development and validation of their questionnaire, including the generation phase of the contents,8 the preliminary investigation of its initial version by means of cognitive interviewing techniques,9 the analysis of its validity, reliability and internal consistency10 and its clinical validity and reproducibility.11 Additionally, they performed a confirmatory factor analysis to demonstrate that each of the questionnaire scales was attributable to a single underlying explanatory factor.12 Finally, they also demonstrated the sensitivity to the effects of treatment on the different thyroid diseases evaluated by the questionnaire.13 In addition, as the ThyPRO is a rather long questionnaire, the authors have developed an alternative short version (ThyPRO-39) which has been shown to preserve its measurement properties.14
In recent years, the ThyPRO has been applied to the clinical investigation of different issues related to thyroid disorders, such as the role of autoimmunity markers in the HRQoL of patients with primary hypothyroidism15 or the impact of different treatments on HRQoL among patients with multinodular goiter, Graves’ disease and subclinical hypothyroidism.16–24 Furthermore, it is being used as the main variable outcome in ongoing large-scale clinical trials evaluating emergent therapies for autoimmune thyroid disorders.25 Thus, it can be said that the ThyPRO has become the reference tool for measuring HRQoL in patients with benign thyroid diseases.
The ThyPRO has been translated into numerous languages, and is currently available in English, Danish, German, Dutch, Italian, Portuguese, French, Swedish, Serbian, Polish, Romanian, Bulgarian, Greek, Arabic, Simplified Chinese, Traditional Chinese, Hebrew, Hindi and Tamil, and most of the translated versions have been subjected to transcultural validation.26 However, a translation to Spanish language is still not available. The present study was aimed to adapt and validate a Spanish version of the ThyPRO.
Material and methodsThe questionnaireThe ThyPRO consists of 85 items, summarized in 13 scales plus an individual item concerning overall impact of thyroid disease on HRQoL. Four of the 13 scales cover physical symptoms (goiter-, hyperthyroidism-, hypothyroidism- and eye-symptoms), two scales focus on mental symptoms, three on function and well-being and four on participation and social function. The items are scored from 0 to 4, following a Likert scale (where “0” equals “not at all” and “4” equals “very much”), always considering the patient's perception during the last four weeks.
The short version of the ThyPRO (ThyPRO-39) retains 39 of the 85 items composing the original questionnaire, organized into four scales about symptoms, seven scales about physical, psychological and social well-being, a scale about physical appearance concerns, as well as an individual item on the overall impact on quality of life. In addition, it includes a composite score summarizing the results from the scales on non-thyroid symptoms, physical, psychological and social well-being.
Translation of the questionnaire to Spanish languageThe adaptation of the ThyPRO questionnaire to Spanish language was performed according to the ISPOR Task Force recommendations.27
The English version of the original questionnaire was translated into Spanish by two independent translators with English proficiency, one with and one without medical background. Both versions were discussed to agree on a reconciled preliminary Spanish questionnaire. This draft was sent to an English native expert translator in medical and scientific texts, but not familiar with the original version of the questionnaire, for back-translation. The main researcher of this project and the developer of ThyPRO evaluated and compared the wording, grammatical structure and meaning of each of the items of this back-translation with the source English version. In case of discrepancy, modifications were made and a second back-translation of the new version was performed by a second independent English native translator. This procedure was repeated until a definitive translated Spanish version was finally reached.
The translated questionnaire was pretested on five representative individuals with different thyroid diseases, using cognitive interviewing: a 49-year-old man with Graves’ disease, active hyperthyroidism and thyroid orbitopathy, a 55-year-old woman with hypothyroidism, a 62-year-old woman with non-toxic nodular goiter, a 22-year-old woman with euthyroid Graves’ disease and goiter, and a 61-year-old woman with “Hashitoxicosis” (autoimmune thyroid disease with spontaneous phases of hypo-, hyper- and euthyroidism). Interviews were conducted by a clinical psychologist with wide experience in health-related questionnaires.
SubjectsThe definitive version of the Spanish questionnaire (ThyPROes) was applied in a self-administered manner to a convenience sample of 155 adult Spanish subjects with different benign thyroid diseases. Participants were recruited successively in the Department of Endocrinology of a single tertiary hospital, when they attended a scheduled appointment for medical visit or thyroid ultrasonography. For the present investigation they were grouped in six major mutually excluding categories: non-toxic goiter, toxic nodular goiter, Graves’ disease, thyroid orbitopathy, autoimmune hypothyroidism and other benign disorders. The main clinical and sociodemographic characteristics of the participants are shown in Table 1. All patients signed informed consent before recruitment. The study was approved by the local Ethics Committee.
Main clinical and sociodemographic characteristics of the participants in the present study and in the Danish reference population used for transcultural validation.
Spanish subjects (N=155) | Danish subjects (N=902) | |
---|---|---|
Age (years) | 52 (44–61) | 51 (40–61) |
Women/men (%) | 84.5/15.5 | 87/13 |
Education level (%) | ||
Primary | 41.3 | 29 |
Secondary | 43.2 | 29 |
Tertiary | 15.5 | 38 |
Not available | – | 4 |
Clinical diagnosis (%) | ||
Non-toxic goiter | 38.7 | 29 |
Toxic nodular goiter | 7.7 | 16 |
Graves’ hyperthyroidism | 14.8 | 18 |
Graves’ orbitopathy | 7.1 | 10 |
Autoimmune hypothyroidism | 31 | 22 |
Other | 0.6 | 4 |
Comorbidity (%) | 52.3 | 56 |
Current treatment (%) | ||
Levothyroxine | 48.4 | 32 |
Antithyroid drugs | 9 | 18 |
Previous treatment (%) | ||
Radioiodine | 11 | 13 |
Thyroid surgery | 12.9 | 14 |
TSH (mU/l) | 1.99 (0.97–3.74) | 1.12 (0.36–2.7) |
Data are percentages and medians (interquartile range).
The internal consistency of both the ThyPROes and its short version (ThyPRO-39es) was assessed using multitrait scaling analysis. To evaluate convergent validity, correlation coefficients between each item and the total score of the remaining items of the own scale were calculated, and to evaluate discriminant validity, correlations were calculated between each item and the total score of the rest of the scales. Correlations greater than 0.4 between the individual items and their own scales were considered indicative of adequate convergent validity (all items in a scale appear to measure the same construct). If the correlations between each item and other scales were lower than those observed with their own scale, discriminant validity was considered adequate (items in different scales measure different constructs). Validity was additionally assessed by calculation of inter-scale correlations. The reliability of the internal consistency was evaluated by the Cronbach's alpha coefficient, based on the average of the correlations between the items of each of the scales. Cronbach's alpha ranges from 0 to 1, with values >0.7 usually considered as indicative of satisfactory internal consistency.28 Pearson's correlations were used in all cases. All analyses were performed for both the ThyPROes and the ThyPRO-39es.
Transcultural validationEquivalence between the Spanish version and the original Danish questionnaire was assessed using tests for differential item functioning (DIF). An item presents DIF when the probability of a correct response by individuals with the same ability level differs across groups with different characteristics (in this case different language). Responses to the ThyPROes and to the ThyPRO-39es were compared pairwise to the Danish version using data from a previously reported sample of 902 subjects recruited from two endocrine clinics in Denmark.26 The characteristics of this reference population are also shown in Table 1. DIF was assessed by means of ordinal logistic regression, controlling for specific diagnosis. The independent variables were language group and scale score, and an interaction term between scale score and language group. DIF was considered uniform or non-uniform depending on whether the interaction term was significant or not. DIF magnitude was considered substantial if it could explain more than 2% of the variance in the item score (R2 difference >0.02).
ResultsTranslation of the questionnaire to Spanish languageThe comparison of the first back-translation and the English master questionnaire identified discrepancies that required minor modifications in the introductory paragraph to explain the contents of the questionnaire, in six of its 85 items and in one of the five response options to the questions. Alternative wording was considered for other two items, but was eventually discarded. Item 7b, which evaluates feelings of depersonalization (’felt “not like yourself”?’ in the English questionnaire), which became the most complex for translation, required a new modification after the second back-translation, and was definitively written as ‘¿se ha sentido extraño consigo mismo?’.
In the pretesting phase of cognitive interviews, the five interviewees reported that the questionnaire was easy to understand and that the language was adequate. Nevertheless, four of them reported some doubts in the response process. According to the question answering model, in most cases the problems referred to the phase of elaboration of a judgment on the processed information. In particular they concerned attribution, i.e. the patients showed their doubts about whether the problems referred to in some questions were due to their thyroid disease or to other reasons. These issues were considered inherent to the ThyPRO itself rather than to the translation, and did not result in any additional modification of the questionnaire.
Psychometric validationThree items of the ThyPROes and four items of the ThyPRO-39es lacked convergent validity (two of them were common to both versions of the questionnaire). On the whole, three of the five items lacking convergent validity in either of the two questionnaires were part of the scales that assess the impact of thyroid disease on impaired social life and cosmetic complaints (Tables 1 and 2 in Supplementary Material). Nine items in the ThyPROes and 14 in the ThyPRO-39es correlated more strongly with one or more other scales than with its own scale (lack of discriminant validity). Scales that evaluate symptoms related to goiter and the eyes, and those that evaluate cognitive complaints, anxiety and cosmetic problems showed complete discriminant validity in both the ThyPROes and the ThyPRO-39es (Tables 1 and 2 in Supplementary Material).
The correlation coefficients between the different scales of the ThyPROes were low or moderate in most cases. Scales dealing with tiredness and mental health (particularly depression, anxiety and emotional susceptibility) were the most strongly correlated ones (Table 3 in Supplementary Material). The analyses of the reliability of the internal consistency showed Cronbach's alpha values higher than 0.7 in all the scales of ThyPROes, except for the one measuring “Impaired Social Life”, whose Cronbach's alpha coefficient was 0.69, just below the widely accepted cut-off of 0.7 (Table 2). As expected by the reduction in the number of items of each scale, internal consistency reliability estimates were lower in the ThyPRO-39es than in the ThyPROes. Even so, all the ThyPRO-39es scales, except “Impaired Social Life” and “Cosmetic Complaints” had Cronbach's-alpha coefficients greater than 0.7 (Table 2).
Cronbach's alpha coefficients for scales of the ThyPROes and ThyPRO-39es questionnaires.
ThyPROes | ThyPRO-39es | |
---|---|---|
Goiter symptoms | 0.92 | 0.84 |
Hyperthyroidism symptoms | 0.84 | 0.75 |
Hypothyroidism symptoms | 0.76 | 0.77 |
Eye symptoms | 0.91 | 0.81 |
Tiredness | 0.90 | 0.80 |
Cognitive complaints | 0.94 | 0.90 |
Anxiety | 0.93 | 0.90 |
Depression | 0.93 | 0.79 |
Emotional susceptibility | 0.91 | 0.72 |
Impaired social life | 0.69 | 0.67 |
Impaired daily life | 0.90 | 0.79 |
Impaired sexual life | 0.84 | – |
Cosmetics complaints | 0.78 | 0.55 |
Composite scale | – | 0.88 |
Eight items in the ThyPROes (one in the “Goiter Symptoms” scale, one in the “Eye Symptoms” scale, two in the “Tiredness” scale, three in the “Anxiety” scale and one in the “Emotional susceptibility” scale) and two items in the ThyPRO-39es (both in the “Anxiety” scale) were flagged with DIF. As a result, eight scales of ThyPROes and nine scales of ThyPRO-39es were free of DIF. DIF was found to be non-uniform in question 3a of the ThyPROes “Tiredness” scale (’¿se ha sentido lleno de vida?’). As shown in Tables 3 and 4, the magnitude of DIF was small in most of cases (explained variance in the item score <3% in seven items of the ThyPROes and in one of the ThyPRO-39es) and direction of DIF in items belonging to the same scales was variable.
Differential item functioning of the ThyPROes.
Abbreviated item wording | Variance explained by DIF (%) | Direction of DIF |
---|---|---|
Goiter symptoms scale | ||
Sensación de ahogamiento | 2.7 | Spanish patients report less sensation of suffocating |
Eye symptoms scale | ||
Empeoramiento en la visión | 2.9 | Spanish patients report more impaired vision |
Tiredness scale | ||
Sentirse lleno de vidaa | 2.5 | Spanish patients report less vitality with more tiredness and more vitality with less tiredness |
Afrontar las demandas de la vida cotidiana | 5.6 | Spanish patients report better coping |
Anxiety scale | ||
Sentirse nervioso | 2.7 | Spanish patients report more nervousness |
Preocupación por estar muy enfermo | 2.4 | Spanish patients report less concern about being seriously ill |
Sentirse incómodo | 2.1 | Spanish patients report less uneasiness |
Emotional Susceptibility scale | ||
Sentirse extraño con uno mismo | 2.5 | Spanish patients report lower levels of self-unrecognizability |
The measurement of HRQoL has become one of the main outcomes in clinical research on thyroid diseases. However, until now there have been no specific tools to assess the impact of thyroid diseases on HRQoL that have been translated and validated in Spanish language. This represents a considerable limitation in the evaluation of the results of clinical studies of thyroid disorders in patients from Spanish-speaking countries, and notably complicates their participation in multinational surveys.
The present report presents an adaptation and validation of the ThyPRO, a questionnaire whose use has been adopted in the latest years by numerous thyroid researchers, and that has been recently recommended for patients with benign thyroid diseases due to the quality of its measurement properties.29
Using the forward–backward translation technique, we obtained a readily comprehensible and easy to fill out Spanish version of the ThyPRO, in which major problems were not identified during a pretesting pilot survey of five representative patients with different thyroid diseases.
Like the original questionnaire10 and other translated versions,30 the ThyPROes demonstrated good quality measurement using classic procedures for reliability assessment. Additionally, following the methodology used for cross-cultural validation of other language versions,26 we also evaluated DIF between the Spanish translated version and the Danish master questionnaire to investigate potential language bias. Although our results could also be explained by some differences between the two samples studied, such as the greater proportion of patients with toxic multinodular goiter and the higher level of academic education among Danish patients, a total of only eight out of 85 questionnaire items were flagged with DIF. These findings are in line with those observed for other linguistic versions, which, using as reference the English master, showed language-related DIF in a number of items ranging from a minimum of one (in Dutch and Danish) to a maximum of 12 in Serbian, 13 in Italian and 16 in Hindi.26 Interestingly, seven of the eight items identified in the present study were also affected by DIF in one or more languages in the previous multicultural validation study of the ThyPRO,26 suggesting difficult translatability or poor equality of these items. In any case, the magnitude of DIF in our study was small in all cases, and the direction of DIF was discordant among items that belonged to the same scale, implying that the global impact on the scale scores will be minor.
Unlike other ThyPRO translations, which were implemented before the development of the ThyPRO-39, we also had the opportunity to validate a Spanish version of this short version of the questionnaire. The measures of construct validity and internal consistency reliability were lower for the ThyPRO-39es than for the complete questionnaire. This finding was expected, since the scales of ThyPRO-39 only keep a reduced number of the items of the original tool, and the internal consistency reliability measures of a questionnaire increase as the number of items that make it up.31 Even so, 10 of the 12 scales of the ThyPRO-39es, as well as the overall Composite score summarizing the seven well-being and function scales, showed a Cronbach's alpha coefficient greater than 0.7. Regarding the DIF analyses, only two items of the ThyPRO-39 showed different functioning between the Spanish and the Danish populations. This low rate of DIF probably reflects that the item selection strategy for development of the abbreviated version of the ThyPRO gave preference to items with transcultural validity,14 and reaffirms the measurement properties of ThyPRO-39 for use in studies of multiethnic populations.
The present research has several limitations. As mentioned above, the study subjects and the reference population used for transcultural validation were not identical regarding diagnostic groups and some sociodemographic variables that could influence the response process, such as the level of academic education. In addition, we have not addressed other validation methods for patient-reported outcome instruments, which otherwise could be subject of further investigations. Among them, test–retest reliability, which assesses the ability to provide consistent measures over time by collecting two responses separated by 2–3 weeks in the same individuals, could provide a more adequate evaluation of the precision of the questionnaire. Confirmatory factor analysis, on the other hand, could be used to support the construct validity of the ThyPROes by demonstrating that its factorial structure corresponds to the original scales of the questionnaire. Finally, the questionnaire has been adapted to the Spanish language used in Spain, and it is possible that it may contain words or linguistic uses with different connotations in other Spanish-speaking cultures. Therefore, it cannot be ruled out that this can generate biases in some items if used without prior validation in other countries.32
In conclusion, the herein reported Spanish versions of the ThyPRO and the ThyPRO-39 show adequate psychometric measures and good cross-lingual validity and are thus suitable for use in clinical studies involving Spanish-speaking subjects.
FundingThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Conflict of interestThe authors declare that they have no conflict of interest.