To evaluate the validity of the Qualis database in identifying the levels of scientific evidence and the quality of randomized controlled trials indexed in the Lilacs database.
METHODSWe selected 40 open-access journals and performed a page-by-page hand search, to identify published articles according to the type of study during a period of six years. Classification of studies was performed by independent reviewers assessed for their reliability. Randomized controlled trials were identified for separate evaluation of risk of bias using four dimensions: generation of allocation sequence, allocation concealment, blinding, and incomplete outcome data. The Qualis classification was considered to be the outcome variable. The statistical tests used included Kappa, Spearman's correlation, Kendall-tau and ordinal regressions.
RESULTSStudies with low levels of scientific evidence received similar Qualis classifications when compared to studies with high levels of evidence. In addition, randomized controlled trials with a high risk of bias for the generation of allocation sequences and allocation concealment were more likely to be published in journals with higher Qualis levels.
DISCUSSIONThe hierarchy level of the scientific evidence as classified by type of research design, as well as by the validity of studies according to the bias control level, was not correlated or associated with Qualis stratification.
CONCLUSIONQualis classifications for journals are not an approximate or indirect predictor of the validity of randomized controlled trials published in these journals and are therefore not a legitimate or appropriate indicator of the validity of randomized controlled trials.
Since 1996, the Coordination of Higher Education Personnel Training (Capes) has been investing great efforts in qualifying Brazilian scientific production, thus resulting in the publication of the Qualis database. Qualis’ initial proposal was to subsidize the assessment of Brazilian scientific publications, but this original proposal has been extrapolated. Today, the results of the assessment have served as a tool to aid in funding concessions, in the inclusion of book titles within libraries and indexes, to guide researchers and readers when choosing titles for the submission of their work or when researching relevant bibliographic materials, to encourage editors to raise the quality standards considered in assessments of works for publication in an attempt to maintain funding, among other relevant circumstances1.
Until recently, the Qualis database had used a classification system based on two dimensions: the data base in which the journal was indexed and the impact factor of the journal, which was measured according to the citations received by published articles. As of 2008, the Qualis database has been altered, changing the original classification to a scale of eight strata, divided into A1, A2, B1, B2, B3, B4, B5, and C2-3.
The Qualis database uses the impact factor, a bibliometric indicator that measures the frequency of citations from scientific production. For classification, Qualis uses the impact factor as a basis, reproducing the main methodological limitations of the impact factor, which include: the absence of citation quality assessment, the inclusion of auto-citations, an analysis centered around publications in English4-5. Upon consulting a group of editors in psychology journals, Costa6 observed that the main complaint form researchers and authors about Qualis was referent to the absence of qualitative appreciation of the journals and the content of their articles.
The Lilacs database presents a considerable number of articles that assess the Qualis database, but no study has effectively tested, in practice, the correlation between Qualis and the strength of scientific evidence or internal validity of the studies. The majority of studies that have assessed the Qualis database have used biliometric quality markers, norms of the journals or studies, and indexing rules7-8. These items do not represent empirical evidence concerning study effect modifiers.
Thus, the objective of the present study is to verify if, despite the criticism of the Qualis database, there in fact exists a correlation between Qualis and the hierarchy of scientific evidence, as well as between Qualis and the risk of bias.
MATERIAL AND METHODSForty dentistry journals, listed in the Lilacs database with an open access through the Virtual Health Library (VHL), were selected. Next, a page-by-page handsearch9 was carried out in the selected journals to identify and analyze the articles. All articles published in the assessed journals were classified. A time window of six years (2002, 2003, 2004, 2005, 2006, 2007) was set.
The reason for justifying the inclusion of regional databases is the ability to reduce publication bias, database, location and country. The reduction of such biases increases the internal and external validity of the answers to clinical questions, including the effects of interventions mediated by the regional context, and provides the best evidence to support decision-making processes of planners, managers, providers of health services10.The relevance of the LILACS database was evaluated in several studies. Systematic reviews to evaluate the medical, Clark11 concluded that the LILACS is generally underused and unpublished indexes articles of good quality and for this reason should be included routinely in search strategies for performing RS. Manríquez12 conducted a study in the field of dermatology and found that there were unpublished studies in LILACS not recoverable in other databases. Freitas13 however, found that for the physical therapy area, had a very small number of RCT, with moderate to high quality published in Spanish or Portuguese, but were difficult to located.
The choice of this time period was due to the time necessary for the published articles to have a chance to be cited by authors from systematic or listed revision from other databases, such as the CENTRAL (Cochrane library), aimed at a future cohort study.
The classification regarding the study type was carried out by two independent reviewers (CAF and HS). Disagreements were discussed until a consensus had been reached. The degree of agreement between the reviewers was measured using the Kappa statistical method, employing a sample of 380 articles (1% error and 99% confidence interval). The test result was considered to be satisfactory (kappa =0.85). For the assessment of the risk of bias from the studies considered to be potential randomized controlled trials (RCT), two independent reviewers (CAF and CAL) classified each of the studies in such a way as to assure that these judgments could be reproduced. The degree of agreement between the reviewers was once again measured by the Kappa statistical method, and the result was considered to be satisfactory (Kappa = 0.89).
The concepts used to classify the studies by study type were based on Rothman14 – Manual for reviewers and glossary of Cochrane Collaboration terms10
The hierarchy of evidence of studies was set according to the scale proposed by the “Oxford Centre for Evidence-based Medicine Levels of Evidence” 15, classifying from the lowest to the highest level of evidence, with some adaptations, in the following manner: laboratory studies (in vitro or animal research) (evidence level of 1); narrative reviews and case series or case reports (evidence level of 2); cross-sectional or descriptive studies (evidence level of 3); case-control study (evidence level of 4); cohort study (evidence level of 5); controlled clinical trials – CCT (evidence level of 6); RCT (evidence level of 7); systematic revisions (SR) with or without meta-analysis (evidence level of 8).
To assess the risk of bias, four dimensions were used: generation of the randomization sequence; allocation secrecy; blinding; assessment of incomplete results data. For each dimension, three response options were offered: yes (low risk of bias), when the dimension was correctly performed and reported; no (high risk of bias), when the author did not execute the dimension, did not report the method, or reported an invalid method; uncertain risk (uncertain risk of bias) when the method employed or the report raised doubts. These items, when appropriately conducted, are important in assuring the internal validity of the RCT16.
The Qualis database was considered to be a results variable, obtained from the site http://Qualis.capes.gov.br. A system of eight strata was used, divided into A1, A2, B1, B2, B3, B4, B5, and C, given that A1 represents the maximum weight, while C represents the minimum weight3.
The statistical analyses were carried out using the SPSS 17 statistical package for Windows. To verify the association between the ordinal variables of three or more categories, the Kruskal-Wallis statistical test was used. To verify the correlation among the Qualis database, the hierarchy of evidence, and the journal, the Spearman's rho test was used. To verify the prediction capacity of the dependent variable, the Qualis database, based on independent variables (control of risk of bias) was used. To determine the size of the impact of the independent variable on the dependent variable, to order the relative importance of the independent variables, and to assess the interaction of the effects, an ordinal regression was used. The significance level used in the tests was 5% (alfa = 0.05), which were considered statistically significant when p<0.05.
This study was analyzed and approved by the Research Ethics Committee from the Paulista School of Medicine at UNIFESP, under the registration number 1891/06.
RESULTSOf the total number of studies assessed (4,879), a Qualis classification could be found for 3,961 (81.18%) studies, as compared to none for 918 studies (18.81%). The frequency distribution of articles by Qualis hierarchy indicates that grading was more frequent with Qualis B4 1.754 (44,30%) articles, followed by B3 1.115 (28.10%) and B1 in 627 (15.80%) articles. We can not find journal articles Qualis A1 and A2, as well as Qualis C.
Table 1 presents the distribution of study types according to the Qualis database qualification and results from the Kendau-tau rank coefficient.
Qualis classification according to the hierarchy of evidence for the assessed studies.
Qualis | ||||||||
---|---|---|---|---|---|---|---|---|
Hierarchy of evidence | Statistics | B5 | B4 | B3 | B2 | B1 | Total | Kendau-tau-c |
Lab | Count | 53 | 602 | 264 | 134 | 416 | 1469 | < 0,000 |
% of Total | 1.30% | 15.20% | 6.70% | 3.40% | 10.50% | 37.10% | ||
NR, RC | Count | 119 | 692 | 473 | 3 | 64 | 1351 | |
% of Total | 3.00% | 17.50% | 12.10% | 0.10% | 1.60% | 34.20% | ||
Trans | Count | 64 | 382 | 271 | 57 | 74 | 848 | |
% of Total | 1.60% | 9.60% | 6.80% | 1.40% | 1.90% | 21.40% | ||
CC | Count | 0 | 6 | 6 | 0 | 2 | 14 | |
% of Total | 0.00% | 0.20% | 0.20% | 0.00% | 0.10% | 0.40% | ||
Cohort | Count | 2 | 10 | 10 | 2 | 7 | 31 | |
% of Total | 0.10% | 0.30% | 0.30% | 0.10% | 0.20% | 0.80% | ||
CCT | Count | 8 | 44 | 63 | 5 | 43 | 163 | |
% of Total | 0.20% | 1.10% | 1.60% | 0.10% | 1.10% | 410% | ||
RCT | Count | 5 | 16 | 25 | 12 | 18 | 76 | |
% of Total | 0.10% | 0.40% | 0.60% | 0.30% | 0.50% | 1.90% | ||
SR | Count | 0 | 2 | 3 | 1 | 3 | 9 | |
% of Total | 0.00% | 0.10% | 0.10% | 0.00% | 0.10% | 0.20% | ||
Total | Count | 251 | 1754 | 1115 | 214 | 627 | 3961 | |
% of Total | 6.30% | 44.30% | 28.10% | 5.40% | 15.80% | 100.00% |
Legends: Lab (Laboratory studies-in vitro, animal research), CR (Case reports/case series), Trans (cross-sectional/descriptive studies), CC (case control studies), Cohort (cohort studies), CCT (non-randomized controlled clinical trials), RCT (randomized controlled trials), SR (systematic revisions)
Overall, the study types were distributed amongst the Qualis categories, with emphasis only on the laboratory studies that dominated the B4 and B1 groups. The narrative research (NR) were most frequent in the B4 (47.50%) and B3 (37.00%) categories. This followed the same standard as the case reports (CR) with 55.30% in the B4 category and 32.80% in the B3 category. Case control and cohort studies were quite infrequent and were published primarily in B4 and B3 journals. Studies with a high level of evidence (CCT, RCT, and SR) were more common in the B3 and B1 categories. The distribution of the study types through the Qualis database classification were tested by Kendau-tau-c, the results of which indicated a significant difference (Valor p<0.000).
Table 2 presents the bivariate correlation coefficients for the journal, Qualis, and the hierarchy of evidence.
Spearman's correlation among the journals, Qualis database and hierarchy of evidence.
Spearman's rho | Statistics | Journal | Qualis | Hierarchy of evidence |
---|---|---|---|---|
Journal | Correlation Coefficient | 1.000 | -0.563** | 0.077** |
Sig. (2-sides) | . | < 0.000 | < 0.000 | |
N | 4879 | 4191 | 4582 | |
Qualis | Correlation Coefficient | -0.563** | 1.000 | -0.097** |
Sig. (2-sides) | < 0.000 | . | < 0.000 | |
N | 4191 | 4191 | 3961 | |
Hierarchy of evidence | Correlation Coefficient | 0.077** | -0.097** | 1.000 |
Sig. (2-sides) | < 0.000 | < 0.000 | . | |
N | 4582 | 3961 | 4582 |
The journal in which the article is published presents a moderate and significant, positive correlation with the Qualis database, given that the proportion of the common variation between the two variables was of 56%. One very weak, yet significant correlation could be observed between the hierarchy of evidence and the Qualis; the proportion of common variation between the two variables was of 7%. Qualis presented a negative and very weak, although significant, correlation with the hierarchy of evidence. The proportion of common variation between the two variables was of 9.7%.
Table 3 presents the strength of relation between the dimensions of risk of bias and the Qualis classification.
Spearman's rho non-parametric bivariate correlation between the dimensions of risk of bias and the Qualis classification database.
Generation | ||||||
---|---|---|---|---|---|---|
Variables | Spearman's rho | Allocation | Secrecy | Blinding | Incomplete Results | Qualis |
Generation of Allocation | Coefficients | 1.000 | 0.073 | 0.342* | 0.159 | 0.079 |
Sig. (2-lados) | . | 0.621 | 0.016 | 0.274 | 0.591 | |
N | 49 | 49 | 49 | 49 | 49 | |
Allocation secrecy | Coefficients | 0.073 | 1.000 | 0.124 | 0.244 | -0.011 |
Sig. (2-lados) | 0.621 | . | 0.394 | 0.091 | 0.942 | |
N | 49 | 49 | 49 | 49 | 49 | |
Blinding | Coefficients | 0.342* | 0.124 | 1.000 | 0.205 | 0.157 |
Sig. (2-lados) | 0.016 | 0.394 | . | 0.157 | 0.281 | |
N | 49 | 49 | 49 | 49 | 49 | |
Incomplete results data | Coefficients | 0.159 | 0.244 | 0.205 | 1.000 | 0.071 |
Sig. (2-lados) | 0.274 | 0.091 | 0.157 | . | 0.627 | |
N | 49 | 49 | 49 | 49 | 49 | |
Qualis | Coefficients | 0.079 | -0.011 | 0.157 | 0.071 | 1.000 |
Sig. (2-lados) | 0.591 | 0.942 | 0.281 | 0.627 | . | |
N | 49 | 49 | 49 | 49 | 49 |
To verify if there was in fact a relation between the dimensions of risk of bias and the Qualis classification, the Spearman rho correlation test was used. Only the generation of allocation and blinding presented strong moderate, positive, and significant relations. The Qualis classification was not significantly related to any dimension of risk of bias. In other words, valid studies, with a low risk of bias, are not correlated with the highest levels of the Qualis database.
Table 4 presents the results of an ordinal regression to assess the relation between Qualis and the dimensions of the risk of bias.
Ordinal regression considering the Qualis classification as a dependent variable and the dimensions of risk of bias as independent variables for 50 potentially RCT studies indexed in the LILACS database from 2002 to 2007.
Confidence Interval 95% | ||||||||
---|---|---|---|---|---|---|---|---|
Estimate | Standard Error | Wald | df | Sig. | Inferior | Superior | ||
Limit | [Qualis = 1] | -4.413 | 1.335 | 10.93 | 1 | 0.001 | -7.029 | -1.797 |
[Qualis = 2] | -1.711 | 0.934 | 3.352 | 1 | 0.067 | -3.542 | 0.121 | |
[Qualis = 3] | -0.781 | 0.907 | 0.742 | 1 | 0.389 | -2.559 | 0.996 | |
[Qualis = 4] | 0.208 | 0.9 | 0.053 | 1 | 0.817 | -1.556 | z | |
Position | [Generation = 0] | -0.212 | 0.811 | 0.068 | 1 | 0.794 | -1.802 | 1.378 |
[Generation = 1] | 19.497 | 0 | . | 1 | . | 19.497 | 19.497 | |
[Generation = 2] | 0a | . | . | 0 | . | . | . | |
[Secrecy = 0] | 0.163 | 0.841 | 0.038 | 1 | 0.846 | -1.484 | 1.811 | |
[Secrecy = 1] | -0.679 | 1.347 | 0.254 | 1 | 0.614 | -3.319 | 1.962 | |
[Secrecy = 2] | 0a | . | . | 0 | . | . | . | |
[Blinding = 0] | -0.318 | 0.617 | 0.265 | 1 | 0.607 | -1.528 | 0.892 | |
[Blinding = 1] | 0.823 | 0.941 | 0.766 | 1 | 0.381 | -1.02 | 2.667 | |
[Blinding = 2] | 0a | . | . | 0 | . | . | . | |
[Incomplete = 0] | 0.221 | 0.907 | 0.059 | 1 | 0.807 | -1.557 | 1.999 | |
[Incomplete = 1] | -0.642 | 0.679 | 0.894 | 1 | 0.344 | -1.972 | 0.689 | |
[Incomplete = 2] | 0a | . | . | 0 | . | . | . |
Legends: df- degree of freedom; Sig. Significance
As could be observed, none of the dimensions of risk of bias were significantly associated with the Qualis classification. That is, valid studies with a greater control of the items of risk of bias proved not to be associated with better Qualis classifications.
DISCUSSIONOne of the limitations of this study was evaluate a single area of knowledge, dentistry, and the Qualis database only with open access articles indexed in LILACS database. These limitations indicate that findings of this study should be considered very carefully for other areas of knowledge. On the other side if Qualis provides a valid measure of quality of scientific production, and the database and area presents a low level of hierarchy and high risk of bias, this outcome should be well demonstrated by strong correlation with the lowest levels of hierarchy in Qualis database. What happened is that the relationship was random among Qualis are the two variables tested. That is, the theory of quality of scientific production that sustains the measurement proposed by Qualis database is not valid for measuring perhaps the two most important dimensions of quality of scientific production: the strength of scientific evidence and studies validity
What is noteworthy is the predominance of studies with a low level of evidence (in vitro studies, animal studies, narrative reviews, cross-sectional studies, case reports, or case series). Furthermore, studies with higher levels of evidence, such as RCTs and SR, represented only 1.94% of the articles listed in LILACS. Because of the study type, the RCT is the only one able to minimize any major biases that could distort the outcomes of interventions. RCTs with a low risk of bias provide information with greater validity, when compared to other types of studies, and are the main source of primary studies for SR (systematic review) in the literature17. A similar distribution of study designs was also demonstrated in previous studies. In assessing Brazilian dentistry journals, Leles18 found that designs with a greater volume of publication were in vitro research, representing 28% of the examined articles. Oliveira19 assessed 5,453 articles on dentistry published between 1993 and 2003 and found that only 6.44% were RCT or CCT (Clinical controlled trial), while four (0.07%) were SR.
Proportionally, the studies published in journals with better Qualis classification were SR, followed by laboratory studies, CCT and RCT. Case study reports and NR (narrative review) were published in journals with lower Qualis classifications in Categories B4 and B5. In the worst Qualis category, B5, NR, and cross-sectional studies were the most frequent types of articles. In terms of frequency, in vitro and animal studies with low level of evidence were predominant in the B2 category, accounting for 66.3% of all articles in this class. Receiving a high Qualis classification, but without receiving a high level in the hierarchy of evidence, these studies contribute to the distribution of the hierarchy of evidence for Qualis classification, presenting a nonlinear relationship with a U-shaped distribution curve in which studies with a low level of evidence receive high Qualis classifications.
The association between Qualis and journals was strong and significant, confirming that each journal represents a stratified sample of the field of knowledge and not the methodological rigor of the articles published within them. As a result, studies with a low level of evidence and a high risk of bias can receive excellent Qualis classifications if published in journals with a high Qualis level21.
These findings provide empirical evidence which proves that there is no association between Qualis classification and the hierarchy of evidence and the validity of scientific studies published in the dentistry field. The practical consequences of this finding is that the Qualis base could represent a policy that encourages and supports distortions in scientific production in the area, in which post-graduate programs focused on in vitro or animal studies could be as well or better assessed and funded than programs focused on clinical trials that produce support for clinical decision-making.
Qualis stratification aims to hierarchize the quality of publications, but quality is a complex construct and must be distinguished from the validity of the studies. Quality suggests that the author of one study conducted a survey according to the highest standards of methodological rigor. However, even high-quality studies may present a high risk of bias due to limitations imposed by the type of design used in the research9. For example, in some situations, it is impossible to concealment allocation research groups, though the remainder of the study maintains high quality standards. The lack of concealment in the allocation produces a study with a high risk of bias22
Publication quality may result in a series of markers, such as the number of citations, level of indexing in the database, approval by an ethics committee, publication in a peer-reviewed journal, bibliographic standards, the performance of previous sample calculations, or reporting studies in accordance with standards such as CONSORT. However, these quality markers have no direct implication on the strength of the evidence or risk of bias in the studies22. The results of a main component analysis of 39 quality markers used to hierarchize academic production, including the impact factor, indicated that the concept of scientific impact is a multidimensional construct that cannot be correctly measured by any single indicator, although some measures are more suitable than others. The number of citations used as impact factors is not positioned at the core of this construct, but rather on its margin and, therefore, should be used with caution24.
The results of this study show that the hierarchy level of the scientific evidence classified by type of research design, as well as by the validity of studies according to the bias control level, are not correlated or associated with Qualis stratification, which aims to measure the quality of scientific publications in journals indexed in this base, and is based on the impact factor and the indexing hierarchy in databases.
Since its introduction, many articles have been published pointing out the deficiencies in the Qualis assessment. Axt25, for example, suggested that this form of assessment induced a homogenous, individualistic thought and exclusionary competition among researchers. He criticized the emphasis of the model in its international position, believing that education and public health, as local and regional-oriented issues, could not be assessed by this model, due to Qualis's inability to assess interdisciplinary productions. When consulting a group of journal editors in psychology, Costa6 observed that the main criticism of the Qualis assessment concerned the lack of qualitative assessment of the journals and the content of the articles. Jacon26 showed that the Qualis classification of journals did not necessarily induce the researcher to effectively use the better classified journals. In psychology, a survey of journal citations in theses and dissertations showed a greater use of articles from Qualis B journals as compared to Qualis A.
Rocha-e-Silva27 highlighted that, due to the new Qualis classification, relatively trivial contributions in some medical “subject categories” merited Qualis A, whereas contributions with high scientific value in other “subject categories” have a very remote chance of reaching that level. The results of this study confirm that the hierarchy of scientific evidence and the internal validity of studies did not influence their Qualis classification. Therefore, it is not considered to be a predictor of the scientific quality of the publications.
The internal validity of studies due to the low risk of bias and the strength of evidence resulting from the type of research design are essential requirements for establishing reliable conclusions regarding the effectiveness both of clinical interventions as well as in public health. Encouraging research studies with a low validity may result in serious consequences for the population and an important loss of opportunity to properly allocate limited resources. The strength of evidence produced by a body of valid studies and high hierarchy of evidence represents the most reliable source for supporting decisions. This should therefore be more valued by the indicators used to hierarchize scientific production and guide those institutions that support and promote research.